feat!(ethexe): malachite by grishasobol · Pull Request #5397 · gear-tech/gear

grishasobol · 2026-04-30T11:29:16Z

No description provided.

Intermediate state before switching the producer to pull quarantine status directly from the Database. This commit is about to be superseded.

…atabase Replace the rolling eth_head_history in State with a direct read of the current EB chain head from DBGlobals::latest_synced_block. New quarantine module exposes two helpers built on top of ethexe-db: - anchor(db, q): producer picks the youngest EB that has ≥ q canonical descendants, matching ethexe-compute's find_canonical_events_post_quarantine semantics. - verify_passed(db, candidate, q): validators reject a proposal whose AdvanceTillEthereumBlock hash isn't an ancestor of the local head at depth ≥ q. Genesis is accepted unconditionally so the short-chain fallback stays consistent between the two sides. State::validate_proposal_parts now enforces exactly one AdvanceTillEthereumBlock tx and runs it through verify_passed; the proposer path (app::GetValue) calls State::quarantine_anchor and falls back to the genesis hash when the DB walk fails (e.g. we haven't synced enough blocks yet). The chain_head_tx/rx mpsc is gone along with MalachiteService::receive_new_chain_head and the call site in ethexe-service's event loop — the producer reads DB state directly at GetValue time, which is also what gives validators a definition of "the local view" they can compare a proposal against. MalachiteConfig renames quarantine_depth: u32 to canonical_quarantine: u8 so the same value flows end-to-end between Malachite and ComputeConfig; default is ethexe_common::gear::CANONICAL_QUARANTINE. MalachiteService::new now takes Database; ethexe-service passes db.clone() on the live path and on the test harness. No changes to block/transaction shape — this commit is strictly about how the anchor is chosen and verified.

Switch the producer and validators from DBGlobals::latest_synced_block to the latest SimpleBlockData received via the observer event stream. The block-header walk still reads ethexe-db, but the reference point is now `State::latest_received_head: Option<SimpleBlockData>`, overwritten on every MalachiteService::receive_new_chain_head call. A dedicated mpsc carries the chain-head updates into the app task; no history is retained — only the most recent value. `latest_synced_block` trails the event stream because it only updates after extra sync processing, so it was producing stale anchors. `ethexe-service`'s event loop now passes the `Observer::Block` payload to both `consensus` and `malachite`. quarantine::anchor now returns `Option<H256>`: `None` when the local chain is still within `canonical_quarantine` of genesis. On that signal the producer simply omits the `AdvanceTillEthereumBlock` tx from the MB — no more genesis fallback. validate_proposal_parts tolerates zero AdvanceTillEthereumBlock txs (legal producer choice), rejects two+, and for one runs the verify against the local latest head (failing when no head has been received yet). quarantine::verify_passed lost its genesis-is-always-ok special case, which was only needed to accommodate the fallback we just removed.

…ash dedup InjectedTxMempool now knows about reference_block mortality, matching the rules ethexe-consensus already enforces (tx_validation.rs): - insert rejects a tx when * its hash is in the seen-hash table (already committed within VALIDITY_WINDOW), or * its reference_block is not yet in the DB, or * reference_block.height + VALIDITY_WINDOW ≤ latest_head.height, or * the pool is at DEFAULT_POOL_CAPACITY (10_000). - set_chain_head(head) is the single GC trigger: it overwrites the tracked head height and purges both the pool and the seen map of entries whose reference_block has aged out. - fetch(head, _gas_budget) is now non-destructive. It returns only txs whose reference_block is a canonical ancestor of `head` within VALIDITY_WINDOW steps; everything else stays put, so a reorg that flips a branch back in makes the tx eligible again without loss. - forget(committed) moves the given txs out of the pool and records their hashes in the seen map under their reference_block, so a re-gossipped duplicate cannot slip back in before aging out. Malachite builds only on top of finalized blocks, so finalize → forget is sufficient for dedup; there is no round-local state to unwind. Mempool trait gets the new set_chain_head + head-aware fetch. EmptyMempool and the app task are updated accordingly. The app now also forwards observer-delivered chain heads into the mempool and, on AppMsg::Finalized, extracts the Injected(..) variants out of the committed SequencerBlock and hands them to forget — for that State::commit now returns the committed block.

Variant A of the validator-identity unification. All of the changes are local to ethexe-malachite + ethexe-service; no upstream malachite crate is forked or patched. context.rs - type SigningScheme = K256 (from malachitebft-signing-ecdsa, using the RustCrypto k256 curve backend). - Address becomes a newtype over gsigner::secp256k1::Address; from_public_key does keccak256(uncompressed_pubkey[1..])[12..] — same derivation the rest of ethexe uses on-chain. - PublicKey / Signature / PrivateKey are the corresponding malachitebft-signing-ecdsa wrappers around k256 types. - Validator / ValidatorSet / Vote / Proposal / ProposalPart keep their shape, minus the ed25519-specific Address::from_public_key helper. Validator gains with_address(…) so genesis entries can be loaded without recomputing the address. - EthexeSigner is now an ECDSA signer backed by a PrivateKey<K256>; signs/verifies votes, proposals, extensions. The same 32-byte secret will later back libp2p and on-chain signing too. genesis.rs (new) - MalachiteGenesis { validators: Vec<GenesisValidator> } loaded from home_dir/genesis.json. - Each entry is consistency-checked: declared address must equal the one derived from the declared public key. Mismatches error out early. - to_validator_set() materializes a sorted, deterministic ValidatorSet. lib.rs - MalachiteService::new now takes (signer: gsigner::Signer<Secp256k1>, validator_pub_key) — the key is the ethexe validator key. The 32-byte secret is exported once from the keyring and drives: * Malachite votes/proposals (via EthexeSigner), * libp2p identity (Keypair built from libp2p_identity::secp256k1::SecretKey::try_from_bytes), * on-chain commitments (via the shared gsigner::Signer). So a node presents a single identity across all three layers. - node_key.json path / load_or_generate_node_key are gone; peer id is now deterministic from the validator key. - ValidatorSet sourced from genesis.json at init; the service checks that the local validator appears in the set and fails loudly otherwise. ethexe-service - malachite: Option<MalachiteService> — only built when the node has a validator key. Non-validator nodes skip Malachite entirely; the event loop uses maybe_next_some() and the receive_* calls are gated behind if let Some(..). - new() plumbs signer.clone() + validator_pub_key into the MalachiteService; test harness keeps malachite = None (tests don't exercise consensus yet). codec.rs - drops the ed25519_consensus::Signature import, uses context::Signature; SignedMessage raw form carries the wrapped ECDSA signature directly (no .inner() unwrap to k256 types). Cargo - workspace: add malachitebft-signing-ecdsa with features ["k256","rand","serde","std"]. - ethexe-malachite: replace malachitebft-signing-ed25519 with malachitebft-signing-ecdsa, add k256 and libp2p-identity (for building the secp256k1 libp2p keypair), add gsigner.

…ool insert Self-audit fallout: - quarantine::anchor / quarantine::verify_passed now take start_block_hash (from DBGlobals::start_block_hash) instead of genesis_block_hash. Walks cannot cross the oldest block the local DB is guaranteed to have; crossing it would read a parent header that isn't stored. anchor returns Ok(None) when the walk would need to go past start_block before finishing canonical_quarantine steps; verify_passed returns Err, so the validator simply skips voting — that's an acceptable outcome per the design. - mempool::recent_ancestors walks until start_block (previously: until H256::zero or a cycle). Fixes the same bug on the mempool side — a ref_block older than start_block would previously pass the ancestry test via an unbounded walk that relied on DB returning None to stop. - mempool::insert now requires the ref_block to resolve to a header unconditionally. Previously we only checked when a head had been observed, which let stale txs sit in the pool on a fresh node until the first head arrived. Rejecting outright is safer; the sender can re-gossip after our DB catches up. - mempool::is_expired uses saturating_add, guarding against u32 overflow on pathological inputs. - State::genesis_block_hash is gone (it was only used for the anchor fallback in the producer path, which we already removed when quarantine::anchor started returning Option). Producer now just skips AdvanceTillEthereumBlock when anchor says None. No behaviour change for full-sync nodes where start_block == genesis.

…t-paced producer Separate the Malachite libp2p peer_id from the ethexe-network swarm by domain-separated keccak256 derivation from the validator secret — operators still manage one master key, but the two swarms no longer share a peer_id (cleaner observability, no cross-protocol routing ambiguity). The validator key still signs Malachite votes/proofs, so peers tie libp2p identity to the on-chain validator via the existing `sign_validator_proof` flow. Wire `--malachite-persistent-peer` through CLI / `MalachiteCliConfig` / `MalachiteConfig` / Malachite's `P2pConfig::persistent_peers` so multi-node deployments can be brought up without the (still disabled) discovery layer. New `ethexe malachite peer-id <pubkey>` subcommand derives the libp2p peer_id offline so operators can populate multiaddrs without having to boot a node first. Producer pacing rework: - `LinearTimeouts.propose = SLOT_DURATION + 1s`. Non-proposer tolerates one ETH slot of silence before incrementing the round. - On `GetValue` cache miss, the proposer evaluates a four-way decision tree based on the parent MB's `last_advanced_block`: * candidate quarantine-passed EB is a strict descendant ⇒ advance + propose immediately; * candidate equals or is unreachable from the parent's anchor (rare deep reorg) ⇒ log::error + skip the advance for this MB; * no advance but mempool has txs ⇒ propose with txs; * nothing to propose ⇒ wait until either a chain-head event or `Mempool::wait_for_new_tx` fires (no deadline — ETH delivers a fresh slot every ~12s in normal operation). - `last_advanced_block` is propagated forward on every BlockProposal by the service handler: latest `AdvanceTillEthereumBlock` in the MB's transactions wins, otherwise the parent MB's value is inherited (zero for the genesis MB). - `is_strict_descendant_of` quarantine helper + unit tests. - `Mempool::wait_for_new_tx` (Notify-backed in `InjectedTxMempool`, pending-forever in `EmptyMempool`). - `MbMeta` gains `last_advanced_block: H256`. Finalization is intentionally not paced: `target_time` stays `None` in `HeightParams`, so a successful commit hits the application immediately. The slot-based pacing applies only to the propose phase.

…, SequencerBlock hash Backfill unit tests for pieces that landed in earlier commits without coverage: - InjectedTxMempool — 9 cases covering insert/fetch/forget/wakeup contracts (unknown ref-block rejection, hash dedup, capacity cap, set_chain_head purge, canonical-ancestor filter, Notify-based `wait_for_new_tx` on success / non-wakeup on rejected insert). - MalachiteGenesis::load — 6 cases covering missing-file, empty set, address/pubkey-mismatch rejection, voting-power default, consistent-load happy path, and `to_validator_set` count. - libp2p key derivation — `derive_libp2p_secret` is deterministic and distinct from the validator secret it was derived from; `malachite_libp2p_peer_id` is a pure function of the validator secret (operators rely on offline derivation). - SequencerBlock — hash is content-addressed (changes with parent or transactions), `Transaction::tag()` mapping is pinned, SCALE round-trip preserves the hash. Adds `tempfile` to ethexe-malachite dev-dependencies for genesis file-load tests. No production-code changes — the few logic touches are in test-only scope.

…rticipant Reshapes ethexe-consensus around malachite-finalized sequencer blocks (MBs): - ChainCommitment.head is now an MB hash (H256), not announce hash. - BatchCommitmentValidationRequest.head: Option<H256>. - BlockMeta.last_committed_announce → last_committed_mb. - Solidity event AnnouncesCommitted → ChainCommitted; ABI artifacts refreshed. - Validator state machine reduced to WaitForEthBlock / Coordinator / Participant. Producer + Subordinate + announce sync are gone. - Coordinator aggregates outcomes from finalized MBs walking mb_meta.parent_mb_hash and submits the existing BatchCommitment shape to Router unchanged. - Participant accepts request.head if it equals or is an ancestor of latest_finalized_mb, otherwise drops the signature with a warning. - Coordinator-side aggregation has a configurable delay (CLI flag --coordinator-aggregation-delay-ms, default 1500ms) so participants can catch up on the same chain head and the previous MB has time to finish executing. - Empty MB outcomes never produce a chain commitment on their own; batches without chain/codes/validators/rewards are skipped. - ConnectService is gone — non-validator nodes run with consensus = None. - timelines.block_producer_at → timelines.block_coordinator_at. DB migrations are not preserved (POC); fast_sync is parked behind a no-op until the MB-driven recovery path lands. Service- and batch-level tests are stripped and will be reintroduced in the next commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- Bump EXPECTED_TYPE_INFO_HASH for BlockMeta + DBGlobals shape changes (db.rs, migrations/v3.rs). - ethexe-rpc: rename `calculate_next_producer` → `next_coordinator` in the test module to follow the production rename. - ethexe-service: thread the new `coordinator_aggregation_delay` knob through `NodeConfig` smoke test, drop the `chain_deepness_threshold` field, switch `ConnectService` users to `consensus = None`, and rename `block_producer_index_at` → `block_coordinator_index_at`. - The `tests/mod.rs` integration scenarios (~6k lines, all built on the announce harness that no longer exists) are wrapped in a `#[cfg(any())]` module so they keep parsing. The `utils` sub-module stays compiled because the lib references `tests::utils::TestingEvent`. The cases will be rebuilt against the MB-driven flow in a follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Rebuilds the batch round-trip test suite that was deleted along with the announce-driven mocks. New cases cover the same surface as before but are wired against MB chains seeded directly into the database: - accepts_matching_request — create→validate happy path. - rejects_duplicate_code_ids - rejects_unknown_code_in_request - rejects_code_not_processed_yet - rejects_digest_mismatch - rejects_head_mb_not_in_chain — replaces the old "non-best announce" case; the manager rejects when request.head is foreign to the chain. - rejects_head_mb_not_computed — head MB exists but is not yet finalized in the local state. - rejects_empty_batch_request — synthetic empty request fails the "empty batch" gate. - batch_size_limit_exceeded_is_rejected_on_validation - squash_orders_negative_value_transitions_first — sender-first sort preserved end-to-end through the squash and the validation digest matches. Helpers `append_mb`, `setup_mb_chain`, `prepare_canonical_batch`, and `mock_batch_manager` ride on the existing `BlockChain::mock` Eth-side scaffolding, plus a `MockElectionProvider` from `ethexe-ethereum` so the manager's middleware dependency is satisfied even though the covered cases never trigger validators-commitment aggregation. Drops the now-unused `BatchCommitmentManager::replace_limits` helper since each test uses its own manager instance. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds back the validators-commitment cases that were dropped along with the announce mocks. The new test threads a `MockElectionProvider` handle through `mock_batch_manager_with_limits_and_election`, sets up canned election results at the right era boundaries, and walks the manager through: - block before election start → no commitment - block right at election start for era 1 → commits validators1, era 1 - block deeper in era 1 election period → same commitment - same block after marking era 1 already committed → no commitment - block at era 2 election start with only era 0 committed → still commits validators2 for era 2 (warning logged) - block tagged as having era 3 already committed → errors out (committing past the next era is a protocol invariant violation) Also nudges the chain config to a 100s era / 50s election so block indices land on the era boundaries we want. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…sses Brings the integration ping test back to life under the MB-driven flow. Three changes were needed: 1. ethexe-malachite: expose `write_test_genesis(path, signer, pub_keys)` so tests can derive a malachite genesis JSON straight from a gsigner keystore without going through the production CLI/keygen flow. 2. ethexe-service tests: each `Node::start_service` now boots a real single-node `MalachiteService` (binding to 127.0.0.1:0 so parallel tests don't fight over ports), threads a `MockElectionProvider`- backed coordinator through, and hands the service a tempdir as malachite home. `Service::new_from_parts` learned to take an `Option<MalachiteService>` + gas allowance so connect-mode nodes keep their `None`. The `ping` test moved out of the disabled `#[cfg(any())]` block. `WaitForProgramCreation` and `WaitForReplyTo` now share the same force-mine hack `WaitForUploadCode` already had — without periodic `evm_mine` calls Anvil sits idle after the last user tx and the coordinator never gets a fresh ETH head to commit the program reply. 3. Producer: `AdvanceTillEthereumBlock` was emitted as a single tx pointing at the youngest descendant, so events from intermediate blocks (program creations, mirror messages, etc.) silently dropped on the floor. The new `collect_advance_chain` walks from the parent MB's `last_advanced_block` to the candidate and the producer emits one `AdvanceTillEthereumBlock` per block in the gap, capped at 1024 to bound catch-up bursts. ethexe-service eagerly persists the chain-head's header on `ObserverEvent::Block` so the producer's `is_strict_descendant_of` check doesn't race the observer's sync. `cargo nextest run -p ethexe-*`: 327 passed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Single \`AdvanceTillEthereumBlock { eth_block_hash }\` tx is supposed to process events for every Ethereum block from the parent MB's \`last_advanced_block\` (exclusive) up to and including the target — not just the target block alone. The previous wiring (one AdvanceTillEthereumBlock tx per intermediate ETH block, emitted by the producer) was the wrong fix and silently dropped events when the producer-side walk was bypassed. This commit moves the range walk into the processor: - \`Processor::process_transitions\` takes a new \`initial_advanced_block\` argument and tracks a per-MB \`current_anchor\`. Each AdvanceTillEthereumBlock walks the canonical chain (\`parent_hash\`) from \`current_anchor\` to the tx's target, processes events for every block in that range, and bumps the anchor. - \`Processor::collect_advance_chain\` performs the walk; the safety cap is 1024 hops, and a missing parent header partway through the walk is treated as a graceful fence (DB doesn't reach back that far) so the genesis MB still produces transitions when the local chain doesn't extend to genesis-zero. - Two new \`ProcessorError\` variants surface "target header missing" and "walk exceeded cap". - \`mb_compute\` reads parent MB's \`last_advanced_block\` from \`MbMeta\` and passes it through. - The \`ProcessorExt\` trait + the test mock in \`ethexe-compute\` and the smoke test in \`ethexe-processor\` are updated for the new parameter. Producer-side change is reverted: producer emits one \`AdvanceTillEthereumBlock\` per MB pointing at the youngest descendant the quarantine anchor allows, exactly as before this saga started. \`cargo nextest run -p ethexe-*\`: 327 passed, 1 skipped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…n commit Plug two structural gaps that surfaced once the multi-validator test went from N=3 to N=4 (quorum 3-of-4 lets BFT progress without one of the validators, so value-sync actually kicks in): 1. `MalachiteEvent::{BlockProposal, BlockFinalized}` were emitted only on the live path (proposer + completed-stream-at-current-height). Synced and buffered-then-promoted MBs slipped through silently — compute never ran, mb_meta.parent_mb_hash chains had holes, and coordinator-side batch commitment then crashed with "MB chain walk reached genesis". Move the DB writes (`set_mb_block`, `mutate_mb_meta`, `globals_mutate(latest_finalized_mb_hash)`) into the malachite app and gate every event behind a new `synced` flag on `MbMeta`: a block is `synced` only when the `parent_mb_hash` chain back to the genesis MB is fully recorded. Buffered events drain once the chain closes, including a cascade through `pending_by_parent` for out-of-order arrivals. Submit also triggers from `StartedRound`'s pending-parts promotion, the path that was previously silent. 2. The producer's `try_include_chain_commitment` propagated errors from the strict backward walk, so any compute lag past the on-chain commit anchor (or a fresh restart with an empty malachite store) crashed the coordinator. Add `collect_computed_uncommitted_predecessors` — walks the canonical chain back from `mb_head`, returns the longest contiguous *computed* prefix anchored at `last_committed_mb`, falls back to an empty result instead of erroring. Producer commits whatever it has; the rest accumulates for the next batch attempt. Participant keeps the strict variant so an unverifiable request still rejects the signature. Also raise `MalachiteConfig::DEFAULT_GAS_ALLOWANCE` to `DEFAULT_BLOCK_GAS_LIMIT` (4T) — 1B was four orders of magnitude too small for `demo-async`'s round-trips. And add `Drop for MalachiteService` that kills the engine actor and aborts the spawned tasks so a stopped validator's libp2p / consensus tree doesn't keep voting. Test harness: per-validator moniker so logs are distinguishable, and two new integration tests — `multiple_validators_ping` (3-of-3 smoke) and `multiple_validators` (4-of-4 with stop/restart, exercises the new synced and lenient-commit paths end-to-end). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…n porposals

semanticdiff-com · 2026-04-30T11:29:20Z

Review changes with

Changed Files

File	Status
ethexe/prometheus/src/lib.rs	68% smaller
ethexe/node-loader/src/batch.rs	68% smaller
ethexe/ethereum/src/router/mod.rs	63% smaller
ethexe/rpc/src/apis/block.rs	39% smaller
ethexe/compute/src/prepare.rs	38% smaller
ethexe/ethereum/src/abi/events/router.rs	36% smaller
ethexe/db/src/database.rs	35% smaller
ethexe/consensus/src/validator/batch/filler.rs	35% smaller
ethexe/common/src/gear.rs	33% smaller
ethexe/consensus/src/validator/batch/types.rs	26% smaller
ethexe/network/src/db_sync/mod.rs	26% smaller
ethexe/consensus/src/validator/batch/manager.rs	24% smaller
ethexe/common/src/mock.rs	24% smaller
ethexe/common/src/consensus.rs	23% smaller
ethexe/network/src/lib.rs	22% smaller
ethexe/common/src/utils.rs	22% smaller
ethexe/common/src/db.rs	21% smaller
ethexe/consensus/src/validator/batch/tests.rs	21% smaller
ethexe/service/src/tests/utils/env.rs	20% smaller
ethexe/rpc/src/apis/program.rs	20% smaller
ethexe/rpc/src/apis/injected.rs	19% smaller
ethexe/consensus/src/validator/mod.rs	15% smaller
ethexe/consensus/src/validator/core.rs	12% smaller
ethexe/network/src/validator/topic.rs	9% smaller
ethexe/service/tests/smoke.rs	8% smaller
ethexe/consensus/src/validator/coordinator.rs	8% smaller
ethexe/db/src/verifier.rs	8% smaller
ethexe/db/src/migrations/init.rs	8% smaller
ethexe/cli/src/commands/check.rs	8% smaller
ethexe/rpc/src/utils.rs	7% smaller
ethexe/consensus/src/validator/batch/utils.rs	7% smaller
ethexe/consensus/src/utils.rs	7% smaller
ethexe/network/src/db_sync/responses.rs	5% smaller
ethexe/compute/src/service.rs	4% smaller
ethexe/db/src/iterator.rs	4% smaller
ethexe/common/src/network.rs	3% smaller
ethexe/db/src/visitor.rs	3% smaller
ethexe/cli/src/params/node.rs	3% smaller
ethexe/network/src/db_sync/requests.rs	3% smaller
ethexe/processor/src/tests.rs	3% smaller
ethexe/common/src/primitives.rs	2% smaller
ethexe/service/src/config.rs	2% smaller
ethexe/consensus/src/validator/participant.rs	1% smaller
ethexe/cli/src/params/network.rs	1% smaller
ethexe/compute/src/compute.rs	1% smaller
ethexe/service/src/tests/utils/events.rs	1% smaller
Cargo.lock	Unsupported file format
Cargo.toml	Unsupported file format
ethexe/cli/Cargo.toml	Unsupported file format
ethexe/cli/src/commands/malachite.rs	0% smaller
ethexe/cli/src/commands/mod.rs	0% smaller
ethexe/cli/src/params/malachite.rs	0% smaller
ethexe/cli/src/params/mod.rs	0% smaller
ethexe/common/src/events/router.rs	0% smaller
ethexe/common/src/injected.rs	0% smaller
ethexe/common/src/lib.rs	0% smaller
ethexe/common/src/mb.rs	0% smaller
ethexe/compute/Cargo.toml	Unsupported file format
ethexe/compute/src/lib.rs	Unsupported file format
ethexe/compute/src/mb_compute.rs	0% smaller
ethexe/compute/src/tests.rs	0% smaller
ethexe/consensus/src/announces.rs	0% smaller
ethexe/consensus/src/connect/mod.rs	0% smaller
ethexe/consensus/src/lib.rs	Unsupported file format
ethexe/consensus/src/mock.rs	0% smaller
ethexe/consensus/src/tx_validation.rs	0% smaller
ethexe/consensus/src/validator/initial.rs	0% smaller
ethexe/consensus/src/validator/mock.rs	0% smaller
ethexe/consensus/src/validator/producer.rs	0% smaller
ethexe/consensus/src/validator/subordinate.rs	0% smaller
ethexe/consensus/src/validator/tx_pool.rs	0% smaller
ethexe/consensus/src/validator/wait_for_eth_block.rs	0% smaller
ethexe/db/Cargo.toml	Unsupported file format
ethexe/db/src/dump/collect.rs	0% smaller
ethexe/db/src/dump/mod.rs	0% smaller
ethexe/db/src/migrations/migration.rs	0% smaller
ethexe/db/src/migrations/mod.rs	Unsupported file format
ethexe/db/src/migrations/v0.rs	0% smaller
ethexe/db/src/migrations/v1.rs	0% smaller
ethexe/db/src/migrations/v2.rs	0% smaller
ethexe/db/src/migrations/v3.rs	0% smaller
ethexe/db/src/migrations/v4.rs	0% smaller
ethexe/ethereum/src/abi/gear.rs	0% smaller
ethexe/ethereum/src/router/events.rs	Unsupported file format
ethexe/malachite/core/Cargo.toml	Unsupported file format
ethexe/malachite/core/src/app.rs	0% smaller
ethexe/malachite/core/src/codec.rs	Unsupported file format
ethexe/malachite/core/src/config.rs	0% smaller
ethexe/malachite/core/src/context.rs	0% smaller
ethexe/malachite/core/src/externalities.rs	0% smaller
ethexe/malachite/core/src/lib.rs	0% smaller
ethexe/malachite/core/src/service.rs	0% smaller
ethexe/malachite/core/src/signing.rs	0% smaller
ethexe/malachite/core/src/state.rs	0% smaller
ethexe/malachite/core/src/store.rs	0% smaller
ethexe/malachite/core/src/streaming.rs	0% smaller
ethexe/malachite/core/src/types.rs	0% smaller
ethexe/malachite/core/tests/multi_validators.rs	0% smaller
ethexe/malachite/service/Cargo.toml	Unsupported file format
ethexe/malachite/service/src/config.rs	0% smaller
ethexe/malachite/service/src/externalities.rs	0% smaller
ethexe/malachite/service/src/lib.rs	0% smaller
ethexe/malachite/service/src/mempool.rs	0% smaller
ethexe/malachite/service/src/quarantine.rs	0% smaller
ethexe/malachite/service/src/service.rs	0% smaller
ethexe/malachite/service/tests/restart_resilience.rs	0% smaller
ethexe/network/Cargo.toml	Unsupported file format
ethexe/node-loader/Cargo.toml	Unsupported file format
ethexe/node-loader/src/bin/ping_rate_load.rs	0% smaller
ethexe/processor/src/lib.rs	Unsupported file format
ethexe/scripts/start-local-network.sh	Unsupported file format
ethexe/service/Cargo.toml	Unsupported file format
ethexe/service/src/fast_sync.rs	Unsupported file format
ethexe/service/src/lib.rs	Unsupported file format
ethexe/service/src/tests/mod.rs	Unsupported file format

Resolved conflicts by keeping our Announce-removal branch; the master changes that re-introduced Announce types in mock.rs, validator/topic.rs, and service/lib.rs are obsolete and discarded. Renamed the on-chain ChainCommitted event to AnnouncesCommitted to match the master contract; it's a label change only — semantics stays "MB head committed". Pulled in master's proptest helpers (scheduled_task_strategy, schedule_strategy, Arbitrary for MessageType / StateHashWithQueueSize) so the new ethexe-runtime-common::proptest module compiles. Bumped EXPECTED_TYPE_INFO_HASH after the new Arbitrary impls touched the type registry. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…r pre-aggregation delay * `ethexe/service/src/lib.rs`: forward `config.node.canonical_quarantine` into `MalachiteConfig` so the producer's `AdvanceTillEthereumBlock` proposals match the depth that participants enforce — otherwise the producer proposes the chain head while validators reject as "needs ≥ default quarantine" and BFT deadlocks. * `ethexe/cli/src/params/node.rs`: default `coordinator_aggregation_delay_ms` to 0. With the MB-driven flow the coordinator no longer has to wait for compute to catch up to a specific Ethereum block (compute keys off `latest_finalized_mb_hash` inside BFT). On anvil's 2 s block time, any non-zero delay caused `CoordinatorBoot`'s pending future to be reset by the next chain head before it could submit, so no batch commitments ever fired in 3-validator local runs. Operators can still tune the value up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replace the single-recipient route (next-coordinator hint) in the injected-tx RPC handler with a fan-out: one `RpcEvent::InjectedTransaction` per validator, recipient pinned. The RPC node-loader sends with a zero recipient, and the previous logic left the tx sitting in only the receiving node's mempool — useful only when that node happened to be the next BFT producer. With broadcast, whichever validator wins the next round can include the tx straight from its local mempool, removing the wait for the RPC-endpoint node to take its turn (which dominated end-to-end promise latency). `forward_transaction` now waits on the fan-out via FuturesUnordered and returns the first `Accept` (or the last `Reject` if every arm rejected). When the validator set isn't known yet — early boot or `Database::memory()` in unit tests — we fall back to the original single-event path so existing tests stay green without fixture updates. Drops the now-unused `route_transaction` / `calculate_next_coordinator` plumbing and their tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`mb_compute::compute_one` was passing `None` for the runtime's `promise_out_tx`, so every promise the executor produced (one per injected dispatch that finishes with a reply) was silently dropped. End result: `injected.send_message_injected_and_watch` subscriptions hung on the loader side because no `SignedPromise` ever reached the RPC layer. Plumb a per-MB unbounded channel through `process_transitions`, drain it after the call returns (all senders are dropped by then, so `recv()` terminates as soon as buffered promises are consumed), accumulate across the predecessor walk, and surface them on `ComputeEvent::MbComputed { promises }`. Service-side: the run loop now grabs the validator's `PrivateKey` via the existing `Signer`, builds a `SignedPromise`, and both gossips it (`network.publish_promise`) and feeds the local RPC server (`rpc.provide_promise`). The local feed is required because gossipsub doesn't echo to the publisher, so a producer that's also the RPC endpoint a client subscribed on would never see the matching promise come back through the network arm. Non-validator nodes don't sign or publish — they get the same vector and discard. Also moves `validator_pub_key: Option<PublicKey>` onto the `Service` struct (alongside the existing `validator_address`) so the run loop can resolve the private key per signing call without re-querying config, and threads it through both the production constructor and the test `new_from_parts`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…n service handler `send_injected_tx` was buried under `#[cfg(any())] mod disabled_until_mb_test_harness_lands`, so nextest skipped it silently. Move it back to the active test scope, drop the per-test imports the active scope already covers, and let it run as part of the regular `ethexe-service` suite. The test's final assertion checks that node-1 has the tx in its `injected_transaction(tx_hash)` keyspace, which is also what the `injected_getTransactions` RPC reads. Neither path was being populated under the MB flow — the service was handing inbound txs straight to the malachite mempool, which is in-memory only. Add a `db.set_injected_transaction(tx)` call on both the RPC and network inbound branches so the local node can serve its own clients' `getTransactions` queries (and the test's assertion now succeeds because broadcast routes the tx to v1's local handler). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The previous wiring drained promises into a `Vec<Promise>` after `process_transitions` returned, then surfaced them as a field on `ComputeEvent::MbComputed`. End-to-end this added the entire MB gas-budget worth of latency to every reply — `gas_allowance = DEFAULT_BLOCK_GAS_LIMIT` (4 trillion) translates to ~4 seconds of sustained execution time on a fully utilised MB, and the loader's median round-trip showed exactly that ~4 s floor. Match the announce-flow pattern from master: * Restructure `MbComputeSubService` around an `MbPromisesStream` alongside the computation future. The stream wraps the receiver end of a per-MB unbounded channel; the executor's `ext_publish_promise` host fn writes to the matching sender. The service polls the stream first on every `poll_next` so promises surface as soon as the runtime emits them, not after the whole block finishes draining gas. * Replace `MbComputed { promises: Vec<Promise> }` with a separate `ComputeEvent::Promise(Promise, H256)` variant emitted one-by-one. `MbComputed` is held back until the promise channel closes (executor done → all sender clones dropped) so promises always arrive before the matching block-finalised marker. * Drop the `pending_event` ordering tripwire so an `MbComputed` doesn't leak ahead of the last buffered `Promise`. * Predecessor MBs walked for crash-recovery still pass `None` for the promise channel — their promises were already gossiped on the earlier run that produced them; re-emitting would just confuse the loader-side dedup. Service-side: `ComputeEvent::Promise` is now its own arm — sign, gossip via `network.publish_promise`, and feed local `rpc.provide_promise` (gossipsub doesn't echo to the publisher). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…nd select `InjectedTxMempool::insert` was firing `notify_waiters()`, which only wakes a `Notified` future that's already parked. The producer's `wait_for_proposable_content` runs let advance = compute_advance_candidate(...); let injected = mempool.fetch(...).await; // (A) if advance.is_some() || !injected.is_empty() { return; } select! { chain_head_notify | mempool.wait_for_new_tx() } // (B) so when a tx lands between (A) and (B), the `notify_waiters()` call finds zero parked Notifieds and the wakeup is *lost*. The producer then falls back to `chain_head_notify`, which only fires every 2 seconds (anvil block time). Direct evidence from the loader run: height 28's proposer entered `GetValue` 30 ms after the loader's tx hit the mempool, then sat in the wait loop for 1.97 s and proposed an MB without the tx — exactly the missing-permit pattern. `notify_one()` keeps the permit pending until a `Notified` consumes it, so a tx racing the select loop boundary now wakes the very next `.notified()` call. The wakeup is still best-effort (producer must re-check `fetch()` afterwards, same as before), but the permit is no longer dropped on the floor. Effect on the 15-minute loader experiment: median promise round-trip should drop from ~4 s (one ETH-block of jitter every iteration) to the actual MB round time, since txs no longer need to wait for the next chain-head event. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Promise gossip used to land in `provide_promise()` before `send_transaction_and_watch` finished its `forward_transaction + pending.accept()` prelude, so by the time the waiter was inserted into `promise_waiters` the matching `Promise` had already been discarded with a "receive unregistered promise" warning. The client subscription then sat forever — explaining the loader hangs that appeared once `notify_one` shaved the producer wakeup latency enough that MB execution started racing the RPC handler. Insert the `oneshot` waiter *before* `forward_transaction` so a promise that beats the broadcast finds a registered receiver. The receiver buffers the value, so even if `pending.accept().await` hasn't completed yet, `spawn_promise_waiter` consumes the buffered promise on its next poll. On any error path (forward failure, subscription accept failure) the waiter is removed before returning so we don't leak entries into `promise_waiters`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ped PING benchmarks Tiny load runner that replays an injected `PING` payload via `send_transaction_and_watch` against a pre-deployed `demo-ping` mirror at a sequence of target tx/s rates. Each step runs for a configurable duration, scheduling fresh sends on a tokio interval (decoupling offered rate from end-to-end latency: in-flight count grows with rate × latency rather than capping throughput). Per-rate output: `rate_<R>.csv` with `wall_ms,latency_ms,message_id` rows — one per completed promise. Errors are logged and counted but don't terminate the run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Without `--network-public-addr` the local docker cluster's nodes have no entry in `external_addresses`, so `validator_discovery` gives up on identity generation and never publishes its DHT record. Result: `network.send_injected_transaction(addr, _)` always returns `ValidatorNotFound` for cross-validator broadcasts, the RPC fan-out drops to local-only delivery, and an injected tx ends up only in the receiving node's mempool. The producer-of-next-MB on the other two validators never sees the tx, so promise round-trips end up gated on the receiver-validator's round-robin proposer turn (~one anvil block × N-validators). Wire each container's deterministic DNS name (`ethexe-node-<i>`) back as `--network-public-addr`. libp2p-identify still does its thing for production deployments where bootnodes hand out the external multiaddr; this flag just unblocks the case where every node sits behind a private docker network. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…an-out The RPC layer broadcasts an injected tx to every validator using the same `transaction_hash` per arm. The service handler stores each arm's `oneshot::Sender` under that hash, and the second remote arm's insert clobbers the first; when the network's `OutboundAcceptance` fires N times in succession the first `.remove()` succeeds but every subsequent one trips `.expect("unknown transaction")` and panics the validator mid-bench. Convert the panic into a quiet no-op. The clobbered senders' receivers already resolve with `Err(_)` (sender dropped), which the RPC fan-out's `FuturesUnordered` treats as just another arm that didn't accept; whichever arm did accept settles the call to `forward_transaction`. We don't lose useful information here — the per-validator acceptance result is invisible to the RPC client anyway. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… synced The RPC fan-out delivers each injected transaction to every validator in parallel via a single libp2p request-response. A recipient whose local DB hasn't yet seen the tx's reference_block (e.g. dell-side validator a few milliseconds behind AWS for the latest Hoodi tip) used to reject at insert — but the RPC has no retry path, so the tx was simply lost on that arm. The producer for the next MB then saw an empty mempool even when the network as a whole had several pending PINGs queued. Soften the insert filter: - accept the tx unconditionally when ref_block hasn't resolved yet; - keep the validity-window check whenever ref_block does resolve; - in purge_expired, retain "unresolved" entries (drop only after the ref_block resolves AND ages out of the window). The fetch-time ancestors filter still gates txs by canonical chain, so unresolved/forked txs sit dormant in the pool until the local DB catches up — at which point fetch surfaces them automatically. Also added info-level logs on insert/fetch/build_block_above so the operator can correlate proposer turns with mempool state. Effect on the live Hoodi cluster (60-PING benchmark, rate=1/s): metric | before | after --------------|--------|------- p50 latency | 8.8 s | 272 ms fast (<300ms) | 10% | 52% tail (>10s) | 43% | 15% The tail above 5s remains due to other architectural gaps (proposer turns landing on an empty mempool right after a finalize); this commit fixes the systematic packet-loss path on the dell-side fan-out arm. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… voting nil When the proposer is even a single Hoodi block ahead of the validators (typical: AWS observer is ~50–200 ms ahead of dell observers), its AdvanceTillEthereumBlock anchor (`head − canonical_quarantine`) sits 1 block too shallow from each validator's local POV. The previous synchronous `validate_block_above` returned `Ok(false)` immediately, which left the engine waiting for the full propose timeout (~13 s) before prevoting nil — at which point the round had already been lost and a round-1 reproposer (often AWS) batched the entire backlog. That accounted for the 50 %+ round-1 rate and the latency tail (p75 = 7.3 s, p90 = 11.4 s) we were seeing in the 60-PING benchmark. Make `validate_block_above` async-poll the local view for up to 2 s before giving up: loop { check head + quarantine + advance_chain_locally_synced; return Ok(true) if all pass; return Ok(false) if past deadline; sleep 50 ms; retry } Within the typical ~100 ms observer lag the validator's chain head catches up and the next iteration validates cleanly. The 2-s budget is well below the engine's 13-s propose timeout, so a genuinely divergent proposal still falls back to prevote nil — no loss of the liveness guarantee. Effect on the live Hoodi cluster (60-PING bench at 1/s): metric | v1 (insert fix only) | v2 (insert + validate-wait) --------|----------------------|----------------------------- p50 | 272 ms | 252 ms p75 | 7352 ms | 259 ms p90 | 11 415 ms | 263 ms max | 14 369 ms | 361 ms rounds | 50 % round-0 fail | 200/200 round-0 success At rate=10 (300 PINGs / 30 s): 87 % < 300 ms, max = 693 ms, no tail. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…dropping When a long backlog of finalized MBs accumulated (e.g. after a coordinator stall, restart, or an earlier batch-rejection storm), the producer assembled a single chain commitment containing transitions from *every* uncommitted MB up to the head. The combined ABI-encoded payload routinely exceeded the 100 KB `batch_size_limit`. `BatchFiller::include_chain_commitment` returned `SizeLimitExceeded`, the error was logged at trace level (invisible by default) and the chain commitment was silently dropped — leaving the same backlog (only larger) for the next coordinator round. The chain on-chain never advanced. Two fixes: 1. `try_include_chain_commitment` now grows the commitment one MB at a time and probes `BatchFiller::would_fit_chain_commitment` before each step. The first MB whose inclusion would push the batch past the size limit stops the walk, leaving the previously-fitting prefix to be committed this round and the rest for a future batch. 2. New helper `BatchFiller::would_fit_chain_commitment` clones the size counter and runs the same `charge_for_chain_commitment` predicate without mutating the live state, so the probe is side-effect free. Also adds info-level diagnostics to `try_include_chain_commitment`, `mb_compute::set_mb_outcome`, `processor::handle_router_event` and `collect_computed_uncommitted_predecessors` so operators can correlate producer turns with the chain backlog and see in-flight chunking decisions without enabling trace-level logs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Surfaces every step of the batch-commitment dance at info level so operators can watch a single round end-to-end without enabling trace: - coordinator: batch built (with digest, transitions, signatures, threshold) - coordinator: validation reply accepted/rejected (with running tally) - coordinator: threshold reached — moving to submission - coordinator: submitting batch commitment to Router - coordinator: batch commitment landed on-chain (or failed, with error) - participant: accepting batch — signing reply - participant: rejecting batch validation request (with reason) Used these to confirm two facts on the running Hoodi cluster: 1. After the chunking fix, AWS-as-coordinator turns now reliably reach threshold (3/3) in <200 ms and submit on-chain. 2. dell-as-coordinator turns currently stop at 2/3 signatures — the third reply (typically from AWS) doesn't arrive within the round window. Same AWS<->dell geographic asymmetry that drove our earlier round-1 frequency on MB consensus before the validate-wait fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…+ verify level Surfaces the libp2p gossipsub delivery path so operators can correlate coordinator rounds with which validator replies actually crossed the network and which ones were stalled or dropped: - gossipsub: received raw message (source/propagation_source/topic/data_len) - validator-topic: accepting message (signer/kind) Used these to root-cause the "AWS-coord stuck at 2/3 signatures" symptom on the running Hoodi cluster: 1. node-1 receives the validation request before its consensus state machine has caught up to the matching chain head, so the request is deferred ("WAIT_FOR_ETH_BLOCK ... saved for later"). By the time the request is replayed, latest_finalized_mb has moved on and the head MB referenced no longer passes `is_ancestor_or_equal` — node-1 rejects the deferred request and never replies. 2. node-2's gossipsub reply for AWS-coord block 2747638 arrived at AWS ~4 s after node-2 published it (cross-AS propagation + mesh hops), well past the coordinator's window for that round. Combined with the chunking fix from 68c900f, this restores commits to ~30 s cadence (when AWS is coord and at least one dell reply arrives in time) — a clear regression compared to a fully-collocated mesh, but non-blocking for state progression. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Participants of a batch-commitment validation request rejected the coordinator's `head_mb` whenever the participant's local `latest_finalized_mb_hash` was even slightly older than the head: the legacy check walked back from `latest_finalized_mb` only, so a participant that had already computed `head_mb` via a speculative BlockProposal — but whose `mark_block_as_finalized` cascade had not yet reached it — saw the head as "not on chain" and emitted `ValidationRejectReason::HeadMbNotInChain`. With AWS as coordinator this consistently dropped one dell signature per round, leaving the batch stuck at 2/3 and never landing on-chain. BFT guarantees a unique parent chain for every decided MB, so chain membership is symmetric: if either endpoint is reachable from the other via `parent_mb_hash`, both are on the canonical chain. Walk both directions: 1. From `latest_finalized_mb` back — handles the original case where `head_mb` is older. 2. From `candidate` back — handles the case where the participant is trailing and `head_mb` is newer. Only when neither walk reaches the other endpoint do we treat it as a real fork / missing-data condition. Also routed both `validate_batch_commitment` rejection paths through `tracing::warn!` with the relevant context (head, latest, computed, synced) so future regressions surface at info-level without a trace dive. Effect on the live Hoodi cluster: - Before: AWS coord max signatures=2/3, dell coord 2/3, every batch was discarded. `latestCommittedBatchHash` advanced once after the chunking fix, then stalled. - After: AWS coord routinely hits signatures=3 (~150 ms after broadcast on a typical round), `coordinator: batch commitment landed on-chain`, Router updates, mirror state transitions land, fresh `createProgram` → state-transition commit takes <15 s end-to-end. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Cluster-side debugging is over and the per-event info-level traces from 68c900f / c17594d / ee518eb / 543e1bf became steady-state noise. Trim them back so info: stays meaningful in production: - network/gossipsub: remove the per-message "received raw message" log. - network/validator-topic: remove the per-message "accepting message" log; drop the now-unused tracing dep and helper bindings. - processor: revert "registering new program" back to log::trace! (was promoted purely for the live debug session). - compute/mb_compute: drop the "outcome stored" info log (per-MB churn). - consensus/coordinator: demote per-reply "validation reply accepted", "batch built", "submitting", "threshold reached" → debug. Keep the sole user-facing success line "batch commitment landed on-chain" at info; merged the failed-submit path into the existing Warning event. - consensus/participant: demote "accepting batch" → debug; warn-on-reject stays. - consensus/utils: drop the per-round "aggregation done" / "walk OK" / "stopped at not-yet-computed MB" info logs. The warn on a parent walk that fails to reach `last_committed_mb` stays — that one indicates a real problem. Also drop tracing.workspace = true from network/processor/compute Cargo.toml (only the removed logs needed it). Test fix-ups carried over from earlier mempool work (d178a70): - is_ancestor_resolves_proper_ancestor: bidirectional walk treats the reverse direction as on-chain too — assertion flipped to match the documented semantics. - new is_ancestor_returns_false_on_disjoint_chains covers the separate-fork case. - mempool::insert_unknown_ref_block_is_rejected → renamed to insert_unknown_ref_block_is_accepted; insert is now lenient on unresolved ref_block, filter happens at fetch. - mempool::wait_for_new_tx_does_not_wake_on_rejected_insert: rebuilt to exercise the duplicate-insert reject path instead of the unknown-ref path that no longer rejects. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

….finalized Batch validation no longer walks the parent chain to confirm `head_mb` is on the participant's canonical view. Instead: - New `MbMeta::finalized` flag; flipped to `true` exactly once per MB inside Malachite's `mark_block_as_finalized` cascade. - Validation reads `head_meta.finalized` (O(1)) and rejects with `HeadMbNotFinalized` if it is not yet set locally. - Validation also reads `mb_compact_block(head_mb).height` and rejects with the new `HeadMbAlreadyCommitted` reason if `head_mb.height <= last_committed_mb.height` — coordinator must always advance past the on-chain anchor. - `is_ancestor_or_equal` (and its tests) removed entirely. Why this is sound: BFT-safety guarantees that any two finalized MBs in any honest validator's view are linearly ordered (no two finalized MBs at the same height — that would be a safety violation). So `meta.finalized = true` already proves the MB is on the same canonical chain as `last_committed_mb` (which itself was finalized at the time its multi-sig commit landed on Router). The previous bidirectional walk was a workaround for not having an explicit `finalized` flag — it accepted a `head_mb` that was speculatively computed (via `BlockProposal`) but not yet finalized. That permissiveness was the original symptom we patched in 543e1bf, but the underlying design is cleaner with an explicit flag and stricter semantics: - A participant whose `mark_block_as_finalized` cascade hasn't reached `head_mb` yet (cross-AS gossip lag from the BFT decision) drops the signature for that round. The coordinator's next attempt picks up the participant once it catches up. Mitigated at the deployment level by `coordinator_aggregation_delay` (currently 3s on Hoodi). - Speculative computation can no longer produce a chain commitment for a not-yet-finalized block — Router state can never diverge from the BFT-decided chain, even under speculative-vs-decided divergence (an unusual fault pattern we previously had no defense against). `MbMeta` schema hash updated; all `MbMeta` mutations are via closure so existing `mutate_mb_meta` callers don't need touching. New tests: - `rejects_head_mb_not_finalized_locally`: replaces `rejects_head_mb_not_in_chain` under the new naming. - `rejects_head_mb_at_or_below_last_committed_mb`: covers the new height-advance guard. - `externalities::tests::finalize_advances_globals_and_emits_event`: asserts `meta.finalized` flips on the cascade. - `externalities::tests::save_block_*` (existing): asserts `meta.finalized` stays `false` until the cascade. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`MbMeta` gained a `finalized: bool` field in 458bfca. The new field goes inside the SCALE encoding so old records (34 bytes) cannot be decoded as the new layout (35 bytes). Existing on-disk databases must be wiped and re-initialised — bump the version constant so the explicit error fires instead of silent decode corruption at the next read. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…of MbMeta field Earlier in 458bfca / 02167df I added an `MbMeta::finalized` flag and bumped LATEST_VERSION to 6 to expose finalization as O(1) state. The flag is correct in code, but the SCALE schema break forces a wipe of every validator's on-disk state — and on a deployed cluster without a coordinated wipe across all nodes (one of which I do not operate), wipe-and-resync also resets the validator's view of the chain history that the Router contract still references via `last_committed_mb`. So the schema bump was the wrong primitive for the same correctness goal. Revert those two commits' on-disk shape: - `MbMeta`: drop `finalized`, restore the original three fields and the type-info hash. - `LATEST_VERSION`: back to 5. - `mark_block_as_finalized` no longer mutates `mb_meta`. Same strict semantics as 458bfca, just implemented as a one-pass walk instead of an indexed flag. New `utils::is_finalized_locally` walks back from `globals().latest_finalized_mb_hash` via `mb_compact_block.parent` and returns `true` iff `candidate` is reachable. Sound by BFT-safety: any two BFT-decided MBs are linearly ordered, so reachability through the parent pointer is iff for "finalized locally". Walk depth is bounded by the height gap between `latest_finalized_mb` and `head_mb` — single-digit in steady state (`coordinator_aggregation_delay / mb_block_time`). Behaviorally identical to the flag-based version: - Coordinator's `head_mb` finalized locally → walk finds it → accept. - Participant's finalization cascade lags behind coordinator's `head_mb` (cross-AS gossip, late vote propagation) → walk doesn't reach it → reject. Coordinator's next attempt picks up this participant once its cascade catches up. Speculative `BlockProposal` paths can still produce computed-but-not- finalized MBs in the local DB; the walk does not consider them, so chain commitments cannot reflect speculative-and-later-discarded blocks — same correctness gain as the flag. `HeadMbNotFinalized` and `HeadMbAlreadyCommitted` rejection reasons keep the new naming. `is_ancestor_or_equal` (and its tests) stay removed. New tests: - `is_finalized_zero_candidate_is_universally_finalized` - `is_finalized_self_is_finalized` - `is_finalized_resolves_proper_ancestor_of_finalized_head` - `is_finalized_returns_false_for_descendant_of_finalized_head` - `is_finalized_returns_false_when_no_local_finalization` - `is_finalized_returns_false_on_disjoint_chain` Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

grishasobol and others added 24 commits April 24, 2026 16:48

initial

2c27c72

wip(ethexe/malachite): rolling eth_head_history + quarantine anchor

ab4d2e2

Intermediate state before switching the producer to pull quarantine status directly from the Database. This commit is about to be superseded.

append mb computation

86a431e

split to service and core

32be184

different fixes

115c328

remove accessing block by height, fix blocks hashing

f4a7ac2

fix gas_allowance usage

1b448d8

pass whole network restore; fix problems with not synced eth blocks i…

d379295

…n porposals

remove Announces

9d1c4d0

return some ethexe-service tests back

e315121

grishasobol added the A1-inprogress Issue is in progress or PR draft is not ready to be reviewed label Apr 30, 2026

grishasobol and others added 4 commits April 30, 2026 14:12

start-local-network.sh fix

fa95b8b

append promise waiting time printing in for injected

21c5453

grishasobol and others added 19 commits April 30, 2026 20:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat!(ethexe): malachite#5397

feat!(ethexe): malachite#5397
grishasobol wants to merge 47 commits intomasterfrom
gsobol/ethexe/malachite

grishasobol commented Apr 30, 2026

Uh oh!

semanticdiff-com Bot commented Apr 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

grishasobol commented Apr 30, 2026

Uh oh!

semanticdiff-com Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

semanticdiff-com Bot commented Apr 30, 2026 •

edited

Loading