Skip to content

All-LogUp auxiliary trace bus constraints#2962

Open
adr1anh wants to merge 97 commits intonextfrom
adr1anh/bus
Open

All-LogUp auxiliary trace bus constraints#2962
adr1anh wants to merge 97 commits intonextfrom
adr1anh/bus

Conversation

@adr1anh
Copy link
Copy Markdown
Contributor

@adr1anh adr1anh commented Apr 5, 2026

All-LogUp Auxiliary Trace

What this PR does

Every auxiliary-trace bus in the Miden VM used to be described by a bespoke constraint: some were running-product (multiset) checks written out by hand, others were partial LogUp arguments. Each lived in its own per-component file, and the aux-trace generation code was spread across processor/src/trace/{decoder,range,stack,chiplets}/aux_trace*.rs, duplicating the algebra row-by-row in imperative Rust.

This PR replaces all of that with a single unified LogUp formulation, described declaratively. Every bus — the block hash table, op-group table, stack overflow, range check, chiplet request bus, sibling table, hasher perm-link, ACE wiring, kernel ROM, and the rest — is now a branch of one descriptive AIR (MidenLookupAir) expressed against a small closure-based builder trait. The same description is consumed by the verifier's constraint evaluator, by the prover's aux-trace generator, by the recursive-verifier ACE circuit, and by two debug walkers.

The practical consequences:

  • The final layout is 4 extension columns on the main trace and 3 on the chiplet trace (seven total). The 4+3 split is deliberate: in a future multi-trace setup, the main-trace column count lands under the next power-of-two boundary without padding.
  • Only one committed boundary value per trace (two total, with a forced-zero placeholder in the second slot until trace-splitting lands — see NUM_LOGUP_COMMITTED_FINALS). Seven column accumulators collapse into one partial-sum column per trace.
  • The kernel ROM chiplet is now purely lookup-based. Previously it duplicated a row per callsite; now one row per digest carries a multiplicity. For realistic kernels this deletes thousands of chiplet rows.
  • The per-component aux-trace Rust code is gone, not dual-maintained. Aux-trace generation now walks the same LookupAir::eval description the constraint side does.

How the code is laid out

There are two halves, and it's worth keeping them separate in your head while reading.

The generic LogUp framework lives in air/src/lookup/. It is deliberately free of any Miden-specific types — the intent is to eventually extract it into its own crate. You'll find the closure-based builder stack (LookupBuilderLookupColumnLookupGroupLookupBatch), a LookupAir trait modelled on p3_air::Air, and three concrete walkers of a LookupAir description:

  • ConstraintLookupBuilder — symbolic evaluation for the verifier and the prover's quotient pass.
  • ProverLookupBuilder — concrete-row fraction collection, followed by batch inversion and a fused row-major chunk-parallel accumulator.
  • Two debug walkers under debug/: one that validates the shape of a LookupAir (column count, declared-vs-observed degrees per interaction, canonical/encoded equivalence for cached-encoding groups), and one that balances a real concrete trace, producing a human-readable report of unmatched adds/removes. The second is not yet wired into the prover/processor paths; that's one of the follow-ups.

The pattern is essentially Plonky3's AirBuilder pattern: a LookupAir describes itself once, against an interface, and each concrete builder interprets that description differently. It's also essentially a serializer pattern — the shape is written once, many things read it.

The Miden-specific wiring lives in air/src/constraints/lookup/. Here you'll find one file per extension column, under buses/. This is the place to read if you want to know "what bus is this row pushing into, and how does the interaction encode?" Start with miden_air.rs (the thin aggregator), then main_air.rs and chiplet_air.rs for the two halves, then messages.rs for the full catalogue of typed message structs (25 BusId variants). Every bus message is a concrete struct with its own encoding; each bus has a dedicated domain (BusId::*) so multiple linearly-independent buses can sit on one column via distinct bus_prefix prefixes.

The key conceptual unit inside a column is a group. Groups within a column are additively independent — their degrees add. Within a group, the branches are mutually exclusive: only one fires per row, so the group's degree is the max across branches, not the sum. A branch may be a single interaction or a batch of simultaneous interactions sharing one flag. Every add/remove/batch/insert call is annotated with its numerator and denominator degree, and those annotations are cross-checked by the validator against the symbolic degree actually accumulated. That annotation is what made the current packing (everything inside degree 9) tractable; the same metadata could drive an automated packer later.

Messages and domain separation

We used to derive bus labels by reverse-bitting operation codes. That's gone. Each distinct bus interaction type is now a named BusId variant, cast directly into the bus_prefix table. We've also split several messages that were previously sharing an id — for example, the hasher's various input/output paths now each carry their own variant rather than sharing a generic "hasher message". messages.rs is the centralised place to review domain separation; some of it was Claude-generated and there are still obvious simplifications we'd like to fold in.

Outer / boundary interactions

Public-input-driven terms (program hash seed into the block hash queue, the log-precompile transcript bookends, kernel ROM initialisation from the variable-length public inputs) are emitted through a separate eval_boundary hook on LookupAir that takes a BoundaryBuilder. The debug trace walker consumes this directly. The verifier's scalar check (ProcessorAir::reduced_aux_values) now also drives through eval_boundary via a BoundaryBuilder reducer — so this code path is unified. The MASM recursive verifier still has its own copy of these boundary formulas; unifying the third source is a follow-up.

ACE and MASM recursive verifier

The ACE circuit's LogUp boundary identity (0 = Σ aux_bound + c_block_hash + c_log_precompile + c_kernel_rom) is added in air/src/ace.rs. On the MASM side, aux_trace.masm now does a single Poseidon2 absorb of [COM, W0] (two boundary felts, second always zero), and public_inputs.masm has been reworked to sum the variable-length public inputs as LogUp fractions rather than running the old multiset reduction. The constraint evaluation MASM (constraints_eval.masm) is regenerated by a new regenerate-constraints binary in crates/lib/core/src/bin/.

Tests

The legacy per-component aux-trace unit tests (processor/src/trace/tests/{decoder,stack,range,hasher,chiplets/*}.rs) were deleted alongside their targets. In their place are two new files: lookup_harness.rs builds a column-blind, subset-based expectation DSL over (multiplicity, encoded_denominator) pairs, and lookup.rs is an end-to-end smoke test that drives a tiny real trace through the prover's collection pipeline. The existing tests were heavily rewritten through this harness, much of it Claude-assisted — that's one of the areas worth the most scrutiny. Restoring granular per-component coverage is a blocker before we flip off WIP.

Reviewing this PR

For an efficient walk, I'd suggest:

  1. air/src/constraints/lookup/miden_air.rs and messages.rs — establish the final shape and the bus catalogue.
  2. air/src/constraints/lookup/buses/*.rs — one file per column. Each is the authoritative description of one bus's semantics; read these instead of the per-component design docs (which haven't caught up).
  3. air/src/lookup/builder.rs — the four builder traits. Everything else in air/src/lookup/ is an implementation of these against a specific target (constraint eval, prover, debug).
  4. air/src/lookup/constraint.rs + prover.rs — the two production walkers.
  5. ACE + MASM: air/src/ace.rs, crates/lib/core/asm/sys/vm/{aux_trace,public_inputs,constraints_eval}.masm.
  6. Processor tests: processor/src/trace/tests/lookup{,_harness}.rs and the rewritten per-component files.
Deeper walkthrough — design rationale and nuances

Why a new description layer at all

The previous structure described each bus independently, and nothing enforced consistency between what the constraint code asserted and what the aux-trace builder computed. Different components used different conventions (multiset vs. LogUp), different column layouts, and different ad-hoc label schemes. After this PR, one description drives four concrete uses: the verifier's constraint evaluator, the prover's quotient pass, the prover's aux-trace generator, and the recursive ACE circuit. This unification is the point — not LogUp itself, which was only a means to get there.

Because the description is canonical, the current docs under docs/src/design/lookups/ and the per-chiplet bus pages no longer match the code. We have an issue open for a docs rewrite. The argument for not going through another round of exhaustive LaTeX is that the new code is much more self-describing: the per-bus file names, BusId enum, LookupMessage structs, and Deg annotations spell out the interaction inventory directly. Docs can lean on that and stop duplicating formulas.

How to read a bus file

Start at the top with MidenLookupAir. It defines the lookup argument both for the main trace and the chiplet trace. At the moment we have two sub-AIRs (MainLookupAir, ChipletLookupAir), each carrying its own declared shape: the number of columns, the max interactions per column (which hints the adapter how many fractions to buffer per row), and the bus-ID count required for deriving challenges and sizing the bus_prefix table.

Each sub-AIR also carries a lookup-specific context — the collection of flags that gate interactions. This is built once per eval call and shared between all the columns in that AIR. The flags are designed to express mutual exclusivity: if two flags in a group cannot be simultaneously active, the group's degree is the max of the branches rather than the sum. The context is very similar to the old OpFlags / ChipletFlags but has been re-implemented as its own type for two reasons. First, the constraint-path construction uses polynomial arithmetic (the default), but the prover-path adapter can override the constructor to use boolean arithmetic on the concrete 0/1 decoder bits, which is meaningfully cheaper. Second, it pins down the smaller surface the LogUp code actually needs, independently of the full OpFlags. The override is the hook for the follow-up where we want to hint to the aux-trace generator that all flags on a row are inactive so the whole row can be skipped — padding rows are free wins there.

Walking through a column: you start with a column, containing groups. Groups in the same column are independent — their degrees add. Within a group the branches are mutually exclusive; only one fires per row, and the group's degree is the max. A branch can be a single interaction or a batch of simultaneous interactions sharing the outer group's flag. Every interaction carries a name and a declared (n, d) degree pair, both of which are cross-checked by the validator. Naming and annotating is the reason we were able to find the current packing at all: we extracted the per-interaction degrees and ran a small optimisation pass to find a tighter column layout. That same metadata can drive a brute-force packer next time we need to add an interaction.

The builder pattern

The entry point for the framework is LookupAir — a structure that describes itself to an interface. That interface is the LookupBuilder stack (column → group → batch). This is the same pattern as Plonky3's AirBuilder, and morally the same as serialisation: a struct describes itself once, and different interpreters do different things with that description.

There are three interpretations:

  1. ConstraintLookupBuilder (air/src/lookup/constraint.rs) wraps a LiftedAirBuilder. As it walks the description, it keeps a single rational-function accumulator per scope (column, group, or batch). When a scope closes, its accumulator folds into its parent — the walk is linear, no intermediate storage. Because the target is a generic AirBuilder, the resulting expression can be fed to the prover's quotient computation, the verifier's OOD evaluation, and the ACE circuit evaluation — three consumers of the same description without code duplication. The formulas for how a group composes into its parent are documented in the file.

  2. ProverLookupBuilder (air/src/lookup/prover.rs) is the aux-trace path. It runs in three passes: first it walks the AIR to collect every (multiplicity, denominator) pair per column into a flat buffer (LookupFractions). Second it batch-inverts the denominators. Third it sums the inverses into per-column partial-sum accumulators. Only one auxiliary value needs to be sent per trace at the end — a trick enabled by the generic description, and the reason we went from 7+ committed finals down to 1 per trace (2 total, with the trailing zero for forward-compatibility). Only active branches (flag ≠ 0) get the closure evaluated, saving arithmetic on rows where the interaction isn't firing. The code hasn't been benchmarked carefully yet but it already parallelises; there's headroom. One follow-up on the to-do list is to change the aux-builder traits in our lifted-stark layer to take the LookupAir directly — a larger refactor that would eventually push these lookups purely onto the processor trait.

  3. Debug walkers (air/src/lookup/debug/). There are two, for two different kinds of check:

    • debug/trace/ balances a full concrete trace. It walks the AIR while summing all interactions, and emits a human-readable report of which rows have unmatched adds/removes. This is the replacement for the old bus-debugger infrastructure; the mechanism is in place but we haven't hooked it into the prover/processor paths yet. This also requires eval_boundary to be handled correctly, which it now is.
    • debug/validation/ validates the shape of the description: number of columns, declared degrees match observed symbolic degrees, encoding equivalence for cached-encoding groups (more on this below), scope discipline. Single entry point: MidenLookupAir::validate(layout).

The cached-encoding escape hatch

The biggest tension in this design was between two performance axes: aux-trace generation efficiency and constraint-evaluation efficiency. The canonical description is optimised more heavily for the former — it reads cleanly and only fires closures on active flags. Constraint evaluation, however, must evaluate every branch symbolically, and the canonical description has a lot of repetition. The hasher responses are the worst offender: each variant uses a linear combination of (α + β·addr + β²·node_idx)-shaped prefixes that would ideally be memoised across branches.

To address this, a group can optionally carry a second, arithmetic-form description alongside its canonical one. This is the group_with_cached_encoding API. The constraint-path adapter consumes the encoded form where supplied, giving the expression tree a much better chance at common-subexpression elimination (notably in the ACE codegen, which does aggressive CSE). Part of the validation walker's job is to ensure both representations are equivalent — it walks both paths and verifies the resulting expressions match using the Schwarz–Zippel lemma. We've only applied this to the hasher chiplet so far, where the per-variant repetition is worst; it's a follow-up to sweep the rest.

The MASM and ACE side

The ACE circuit was modified to remove support for multiset reductions and to match the new aux-value layout. We build it with only as many boundary values as there are traces — so 1 for the current single-trace setup, but we allocate 2 (with the second forced zero) as a forward-looking hack, because the MASM recursive verifier already needs to absorb 2 boundary felts in its forthcoming trace-split layout. Updating MASM later to drop the slot is cheaper than updating it twice.

On the MASM side, the layout now expects two extension-field elements for the bus boundary. The larger change was to public_inputs.masm: variable-length public inputs now only need to be summed up as LogUp fractions rather than run through a multiset reduction. That collapses a chunk of code. As a follow-up, we want to reduce all the outer-boundary values in MASM, since they all have the same shape now — much more practical than it used to be.

Despite the constraint-evaluation builder introducing a lot of structural duplication (each branch evaluates its own encoding), the number of inputs and eval gates in the ACE circuit is very similar to before. The ACE codegen's CSE pass absorbs the redundancy.

The test rewrite

Almost every test under processor/src/trace/tests/ was rewritten. The reason is blunt: the aux-trace format changed, so the old assertions were meaningless. The rewrite tried to shift tests from asserting on the final trace shape to asserting on the semantic interactions — each test now declares "for this row, I expect these (multiplicity, message) pairs" and the harness checks that as a multiset subset of the actual prover emissions. Tests are more descriptive as a result. The harness is processor/src/trace/tests/lookup_harness.rs.

Most of the rewriting was Claude-assisted under time pressure, and that's the section of the PR most in need of close review. On the positive side, the harness is column-blind and subset-based, so these tests should survive any future column repack.

Prover changes are mostly nil

The prover itself barely changes. Almost all the logic moved into air/: the aux-trace generator is now just a thin wrapper (MidenLookupAuxBuilder) that calls into build_lookup_fractions + accumulate. The reduction comes from deleting the per-component aux-trace code in the processor — it's all delegated to a single point now.

Follow-ups

Landed since the original PR description:

  • Kernel ROM chiplet reworked onto multiplicity-based LogUp.
  • Deg annotation (name + degree) on every builder call.
  • Bus labels replaced by the BusId enum with direct prefix indexing.
  • Single-column partial-sum accumulator per trace (7 → 2 committed finals).
  • Prover-side op-flags fast path (MainLookupBuilder::build_op_flags override hook).

Outstanding, in rough priority order:

  • Revive per-component processor aux-trace tests. Today only the end-to-end oracle survives, which doesn't localise regressions.
  • Rewrite the bus design docs. docs/src/design/lookups/{multiset,logup}.md and per-chiplet pages still describe the old running-product world and now contradict the code.
  • Wire the real-trace bus debugger into the prover/processor paths. The infrastructure is there; today a silently-imbalanced bus just fails verification.
  • Unify boundary emission. Rust, ACE and MASM still carry three copies of the boundary formulas.
  • Padding-row early-exit hint for trace generation.
  • Cached-encoding optimisation for constraint eval. The hook is there; the widest use case (hasher responses) would save ~58% per design estimates.
  • OpFlagsTrace variant to avoid building all 128 flags polynomially on the trace-generation path.

FG's web intro

https://gisthost.github.io/?10dbb2d98054b6b78bc462e39ecd1929

Test plan

  • make test-fast followed by a targeted make test-air and make test-processor.
  • cargo test -p miden-air for the lookup validator + bus degree inventory.
  • End-to-end proof generation — pending the remaining trace-side integration.

adr1anh and others added 6 commits April 14, 2026 14:21
Squashed snapshot of 24 commits introducing the new closure-based LogUp
lookup-argument API in air/src/constraints/lookup/, plus its supporting
infrastructure (logup, logup_msg, the LookupAir / LookupBuilder /
LookupColumn / LookupGroup / LookupBatch trait stack, ConstraintLookupBuilder
and ProverLookupBuilder adapters, LookupChallenges, MidenLookupAir) and the
seven per-bus emitters in lookup/buses/.

Squashed for clean rebase onto adr1anh/constraint-simplification — the
intermediate commits referenced types (MainTraceRow, LiftedAirBuilder, the
old trace::Challenges) that 2856 reshapes, so a per-commit replay is not
viable. Granular history preserved on adr1anh/bus.pre-rebase-backup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Post-rebase Phase 3 of the lookup-module adaptation:

- Replace LookupChallenges struct with a type alias for trace::Challenges (the
  unified 2856 challenge type with fixed-size bus_prefix / beta_powers arrays).
- Migrate LookupMessage trait to use trace::Challenges in place of the dropped
  LookupChallenges. ConstraintLookupBuilder / ProverLookupBuilder / DualBuilder
  drop the max_message_width / num_bus_ids constructor args (sizes are now
  fixed at compile time via NUM_BUS_TYPES / MAX_MESSAGE_WIDTH).
- bus_prefix() trait method takes usize instead of u16; lookup::bus_id is now
  a thin re-export shim over trace::bus_types.
- Rewrite lookup/buses/{block_stack,block_hash_and_op_group,chiplet_requests,
  range_logcap,wiring}.rs against the typed MainCols struct: replace
  MainTraceRow indexed access with DecoderCols / StackCols / RangeCols /
  SystemCols field accessors, switch OpFlags::new from the
  ExprDecoderAccess wrapper to the (decoder, stack, decoder_next) signature
  (which folds end_next/repeat_next/halt_next into a single struct), and
  recover the ACE chiplet flag with the new virtual s0 = 1 - s_ctrl - s_perm
  formula.
- Add bus parameters to every Challenges::encode call in logup_msg.rs (each
  inherent Msg::encode now passes its bus type from trace::bus_types).
- Stub lookup/buses/{chiplet_responses,hash_kernel}.rs with no-op emit_*
  functions: the original implementations referenced the pre-2856 hasher
  chiplet (32-row P_CYCLE_ROW_0/31, hasher::flags::f_mv etc.) and need to be
  rewritten against the new hasher_control + permutation split + 16-row
  HasherPeriodicCols. Originals are preserved on adr1anh/bus.pre-rebase-backup.
- Delete dead constraints/logup.rs (the legacy RationalSet / Batch / Column
  algebra is no longer referenced; superseded by lookup module).
- Disable constraints/degree_audit.rs tests with cfg(any()) until they're
  ported to MainCols / OpFlags::new(...) / Challenges::encode(bus, [...]).

The two MidenLookupAir canary tests pass: the degree budget (with the two
stubbed columns counted) and the cached-encoding equivalence check.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…56 hasher

Replace the two stubbed bus emitters with full implementations against the
2856 controller/permutation hasher split:

- hash_kernel.rs: sibling table (BUS_SIBLING_TABLE) now uses the
  s_ctrl-gated (s0*s1*s2 / s0*s1*!s2) MU/MV pair from 2856's running
  product instead of the old 32-row cycle-row filter; the four absorb-
  after MVA/MUA variants are gone. ACE memory reads (BUS_CHIPLETS) keep
  the same NUM_ACE_SELECTORS-relative offsets. SiblingMsgBitZero/BitOne
  pick up mrupdate_id at beta^1 for cross-MRUPDATE domain separation,
  matching the SIBLING_TABLE encoding.

- chiplet_responses.rs: the 7 hasher response variants now mirror 2856's
  compute_hasher_response — sponge_start/sponge_respan replace the old
  f_bp/f_abp pair and use the new is_boundary trace column instead of
  cycle_row_0/cycle_row_31; mp/mv/mu/hout/sout keep their previous
  semantics under the new (s_ctrl, hs0..hs2, is_boundary) gating. The
  bitwise/memory/ACE/kernel-ROM responses use the runtime-muxed
  encoding through the existing *ResponseMsg types so the C1 transition
  stays at degree 8.

Selector wiring across both files mirrors lookup/buses/wiring.rs:
s_ctrl = chiplets[0], s_perm = perm_seg, virtual s0 = 1 - s_ctrl - s_perm,
and s1..s4 = chiplets[1..5]. Bitwise k_transition is read via the typed
PeriodicCols::bitwise.k_transition borrow.

Tests:
- miden_lookup_air_degree_within_budget passes (max deg 9, hash_kernel
  lands at 7, chiplet_responses at 8).
- miden_lookup_air_cached_encoding_equivalence passes (no cached calls
  on the two ported columns; chiplet_requests still exercises the
  dual-path machinery).

Also picks up an unrelated rustfmt sweep + dead-import cleanup from
running `make lint`.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Collapse duplicated per-bus precompute into two shared contexts built once
per `MidenLookupAir::eval`:

- `MainTraceContext` holds a single `OpFlags` instance for the 4 main-trace
  buses (was built independently in each bus).
- `ChipletTraceContext` holds a `ChipletActiveFlags` snapshot mirroring the
  active-flag block of `build_chiplet_selectors` for the 3 chiplet-trace
  buses (was recomputed manually in each).

Per-bus cleanups: swap raw `chiplets[OFFSET + FIELD_IDX]` indexing for the
typed `local.bitwise/memory/ace/kernel_rom/controller` borrow views; drop
premature `_e: LB::Expr` named bindings in favor of `Var` captures; use
`.not()` (via `BoolNot`) for boolean negation of known-boolean columns;
delete the `CreqCtx` struct in `chiplet_requests.rs` and inline everything
into a flat `col.group` body (matching `block_hash_and_op_group`'s pattern,
dropping the cached-encoding optimization).

No behavioral change: the degree-budget and cached-encoding-equivalence
tests in `miden_air.rs` both pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…cleanups

Split the monolithic MidenLookupAir into MainLookupAir (4 main-trace
columns) and ChipletLookupAir (3 chiplet-trace columns) behind per-air
extension traits (MainLookupBuilder / ChipletLookupBuilder). The existing
MidenLookupAir is now a thin aggregator delegating to both, so the
degree-budget and cached-encoding tests keep working unchanged while a
future enum-dispatch wrapper can reach for either sub-air independently.

Each sub-air owns a concrete context struct (MainBusContext /
ChipletBusContext) built via a single builder hook (build_op_flags /
build_chiplet_active), so the prover adapter can later override the
polynomial construction with a boolean fast path without touching any
emitter. ConstraintLookupBuilder, ProverLookupBuilder (gated on F = Felt),
and DualBuilder each pick up empty impls of both extension traits.

Bus emitter cleanups:
- Hoist every message-producing computation out of the struct literal
  into closure-local let bindings; each emit_* closure now ends with a
  flat field-punned constructor.
- Replace Into::<LB::Expr>::into(x) turbofishes with x.into() (via typed
  bindings where arithmetic would otherwise be ambiguous).
- Replace .map(Into::into) / .map(|v| v.into()) with .map(LB::Expr::from)
  for concrete From-function-pointer conversion.
- Drop _e / _var suffixes via shadowing; drop redundant : LB::Expr
  annotations wherever the use site already pins the type.
- Introduce h_0 / h_1 bindings for the decoder's [Var; 8] rate halves
  and rate_0 / rate_1 / cap bindings for the chiplet controller's
  [Var; 12] sponge state ([RATE0, RATE1, CAP] convention).
- Replace the 7-bit op_bits fold in block_hash_and_op_group with
  horner_eval_bits.
- Move payload computation *inside* message closures so the prover path
  skips dead work on flag-zero rows.

Drop the scattered #[expect(dead_code)] / #[expect(unused_imports)]
attributes across the lookup tree in favor of a single module-level
#![allow(dead_code, unused_imports)] in lookup/mod.rs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Removes the entire Task #9 cleanup block that the file has been flagging
for deletion since the LookupMessage-trait rewrite (Task #12 / Amendment B).
All consumers in lookup/buses/*.rs reach for the structs directly and
encode via the LookupMessage trait from lookup/message.rs; the legacy
LogUpMessage trait plus its blanket impls had no live callers.

Deleted items (all flagged dead_code by clippy, all verified via grep to
have zero external references):

- `pub trait LogUpMessage` and all of its inherent `encode` method
  delegates on every *Msg type (HasherMsg, MemoryMsg, BitwiseMsg,
  BlockStackMsg, BlockHashMsg, OpGroupMsg, KernelRomMsg, AceInitMsg,
  RangeMsg, LogCapacityMsg, AceWireMsg, MemoryResponseMsg,
  KernelRomResponseMsg, BitwiseResponseMsg).
- `impl_logup_message!` macro + all 15 invocations.
- `struct SiblingMsg` (replaced by `SiblingMsgBitZero` / `SiblingMsgBitOne`
  in the new bus path).
- `struct OverflowMsg` + its LookupMessage impl (stack overflow bus is
  not emitted by the current 7-column LogUp argument).
- `KernelRomMsg::INIT_LABEL` const and `KernelRomMsg::init` constructor
  (legacy naming from before the header-as-builder pattern).
- Four legacy per-label memory response variants
  (MemoryResponseReadElementMsg, MemoryResponseWriteElementMsg,
  MemoryResponseReadWordMsg, MemoryResponseWriteWordMsg), replaced by
  the runtime-muxed MemoryResponseMsg.
- Two legacy per-op kernel ROM response variants
  (KernelRomResponseCallMsg, KernelRomResponseInitMsg), replaced by the
  muxed KernelRomResponseMsg.
- Two legacy per-op bitwise response variants (BitwiseResponseAndMsg,
  BitwiseResponseXorMsg), replaced by the muxed BitwiseResponseMsg.

Drops the now-unused `bus_types` import (all surviving code uses
`super::lookup::bus_id::BUS_*` directly).

File shrinks 700 lines. `cargo check -p miden-air` finishes with zero
warnings (down from 18). `miden_lookup_air_degree_within_budget` and
`miden_lookup_air_cached_encoding_equivalence` both pass unchanged.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
adr1anh and others added 11 commits April 14, 2026 21:05
Demote 3 orphan doc comments (above section banners, documenting nothing)
to plain comments to silence clippy::empty_line_after_doc_comments, and
let rustfmt collapse empty `impl<E> Msg<E> {}` blocks and trim trailing
newlines. No behavioural change.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wires the prover-side LogUp aux-trace collection phase onto a single flat
buffer shared across all rows and columns, ready to feed a fused
batch-inversion + partial-sum pass as follow-up.

LookupAir trait gains `column_shape() -> &[usize]` — the
mutual-exclusion-aware upper bound on fractions any one row can push per
column. Each bus emitter declares its own `MAX_INTERACTIONS_PER_ROW`
const alongside the `emit_*` fn, with a comment explaining the max over
its mutually exclusive branches:

  M1 (block_stack + range_table)        = 3
  M_2+5 (block_hash_queue ∪ op_group)   = 7
  M3 (chiplet_requests)                 = 4
  M4 (range_stack + logpre_capacity)    = 4
  C1 (chiplet_responses)                = 1
  C2 (hash_kernel)                      = 1
  C3 (ace_wiring)                       = 3

Main/Chiplet/Miden LookupAirs expose the per-column shapes via static
arrays assembled from the per-emitter consts.

New `fractions.rs` introduces `LookupFractions<F, EF>`: a single flat
`fractions: Vec<(F, EF)>` and a single flat `counts: Vec<usize>` of
length `num_rows * num_cols`, row-major. No padding. Row r's contribution
to column c is `fractions[prefix .. prefix + counts[r * num_cols + c]]`,
where `prefix` is the running sum of earlier counts. `Vec::with_capacity`
sizes everything up front (`num_rows * Σ shape` for fractions,
`num_rows * num_cols` for counts) so the hot row loop never
re-allocates. `accumulate_slow` is the reference oracle — one
`try_inverse()` per fraction, walks the flat buffers with one cursor and
`counts.chunks(num_cols)` in lockstep, which matches the memory-access
pattern the fused fast path will use.

ProverLookupBuilder refactored to own `&mut LookupFractions`. `column(f)`
split-borrows `fractions` and `counts` disjointly, snapshots the vec
length before/after the closure, debug-asserts `pushed <= shape[col]`,
and appends the count. ProverColumn/Group/Batch push logic is unchanged —
they still thread a `&mut Vec<(F, EF)>` and call `.push((m, v))`.

Tests: two unit tests in `fractions.rs` (handcrafted `accumulate_slow`
round-trip + capacity reservation), one end-to-end smoke test in
`prover.rs` using a custom `SmokeAir` (2 columns, handcrafted eval body
that respects its own shape on every row — random row-data trips the
mutex-aware shape check on the real MidenLookupAir, so the smoke test
avoids that path).

Out of scope: fused batch-inversion + partial-sum fast path, rayon
row-loop parallelism.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds `build_lookup_fractions`, the top-level prover-side entry point that walks a
real `RowMajorMatrix<Felt>` through `ProverLookupBuilder` and returns a populated
`LookupFractions` buffer. Per row it builds a zero-copy two-row `RowWindow` over
the flat matrix storage with wraparound on the last row, composes the per-row
periodic slice by indexing each periodic column at `r mod its period` (reusing a
single `Vec<Felt>` across rows — no per-iteration allocation), and runs
`air.eval(&mut ProverLookupBuilder)`. The `for<'a> A: LookupAir<...>` HRTB lets
one call site work with any `'a` the row loop needs.

Public API surface for downstream crates:
- `miden_air::lookup` — new re-export module exposing `LookupChallenges`,
  `LookupFractions`, `MidenLookupAir`, `ProverLookupBuilder`, `accumulate_slow`,
  and `build_lookup_fractions`.
- `MidenLookupAir` relaxed from `pub(crate)` to `pub`.
- `constraints` stays private; only the items re-exported through `lookup` leak.

Integration test in `processor/src/trace/tests/lookup.rs`: runs a 6-op basic
block (`Pad/Add/Pad/Mul/Drop/...`) through `build_trace_from_ops`, pipes the
materialized main trace into `build_lookup_fractions` + `accumulate_slow`, and
asserts shape bookkeeping, non-degenerate collection, and zero-panic
accumulation. Observed on the tiny span: columns M1, M4, C2, C3 close to zero;
columns M_2+5, M3, C1 don't close individually because they're paired across
different columns in their respective buses. Terminal values are printed to
stderr for manual inspection — hardening into `assert_eq!` checks is deferred
until the follow-up round-trip oracle lands.

Not yet covered by the test: symbolic AIR-constraint validation via
`miden_crypto::stark::debug::*`. That needs a full aux-column matrix in the
legacy shape, which sits behind the still-pending fused batch-inversion +
partial-sum fast path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds two independent computations of the per-row aux-column delta
`Σ m_i · d_i^{-1}` and a test that asserts they agree bit-exactly on a
real program's trace, closing the loop from `build_lookup_fractions`
through to the constraint-path algebra.

**Fused accumulator** (`fractions.rs::accumulate`): replaces the
per-fraction `accumulate_slow` with a single-inversion Montgomery-trick
path via `miden_core::field::batch_multiplicative_inverse` (already
chunked internally for ILP). One field inversion amortized over the
whole trace instead of N. Memory-access pattern is otherwise identical
to `accumulate_slow`, so both paths can coexist as cross-checks.

Unit test `accumulate_matches_accumulate_slow_random` drives a
deterministic LCG to build a random fixture (32 rows × shape [2,1,3])
and asserts `accumulate` produces output bit-exactly equal to
`accumulate_slow` per (col, row).

**Constraint-path oracle** (`column_oracle.rs`, new ~340 lines):
`ColumnOracleBuilder` implements `LookupBuilder` by running the same
`(U_g, V_g)` per-group algebra used by `ConstraintLookupBuilder` /
`DualGroup`, and folds groups into per-column `(U_col, V_col)` via the
standard cross-multiplication rule from
`ConstraintColumn::fold_group`:

  V_col_new = V_col · U_g + V_g · U_col
  U_col_new = U_col · U_g

starting from `(ONE, ZERO)`. The paired `collect_column_oracle_folds`
driver walks a `RowMajorMatrix<Felt>` row by row (mirroring
`build_lookup_fractions`) and returns per-row folded pairs for
downstream comparison. `OracleColumn` / `OracleGroup` / `OracleBatch`
mirror the `DualGroup` / `DualBatch` math verbatim, with `OracleGroup`
also implementing `EncodedLookupGroup` so the cached-encoding path
type-aligns (only the canonical closure runs — cached-encoding
equivalence is already covered by `miden_lookup_air_cached_encoding_equivalence`).

**Cross-check integration test**
(`processor/src/trace/tests/lookup.rs::build_lookup_fractions_matches_constraint_path_oracle`):
runs the same Pad/Add/Pad/Mul/Drop span as the plumbing smoke test
through both paths and asserts
`aux[col][r+1] - aux[col][r] == V_col · U_col^{-1}` for every
(row, col). Catches divergence anywhere in the pipeline — bus
emitters, flag handling, message encoding, collection, batched
inversion, running-sum accumulation, constraint-path group fold. On
the tiny span the prover path and the oracle agree bit-exactly across
all 7 columns and every row.

**Public API**: `miden_air::lookup` re-exports `accumulate`,
`ColumnOracleBuilder`, `collect_column_oracle_folds`, and `LookupAir`
alongside the existing collection-phase surface.

Out of scope (follow-up milestones):
- Wiring `MidenLookupAir::eval` into `ProcessorAir::eval` and flipping
  `AUX_TRACE_WIDTH` to retire the legacy multiset path.
- End-to-end prove/verify test against a real program.

Verification: `cargo test -p miden-air --lib constraints::lookup` (6/6),
`cargo test -p miden-processor --lib trace::tests::lookup` (2/2),
`make test-fast` 2890 passed, `make lint` clean,
`cargo clippy -p miden-air -p miden-processor --all-targets` zero
warnings.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replaces `fractions.rs::accumulate` with a row-major, chunk-parallel,
fused-inversion variant. The old path ran three disjoint full sweeps
(extract denominators, global `batch_multiplicative_inverse`, per-column
walk) into a fragmented `Vec<Vec<EF>>`, with the accumulation walk fully
sequential even under `concurrent`. On realistic traces the walk is the
dominant wall-clock cost, so parallelizing it was the real win.

**New shape**: one contiguous `RowMajorMatrix<EF>` of `(num_rows + 1) *
num_cols` elements. Row 0 is the zero initial condition; row `r + 1`
column `c` holds the running sum of `m_i · d_i⁻¹` through main-trace row
`r`. Within a row all columns live in adjacent memory, so the per-row
inner loop is a tight sequential write.

**Algorithm**:

1. *Prepass* — sequential O(num_rows · num_cols) `usize` scan builds
   `row_frac_offsets` so any row range `[lo, hi)` can locate its flat
   fraction slice in O(1).
2. *Phase 1 — chunked fused walk* (parallel under `concurrent`, else
   serial): rows are split into groups of `ACCUMULATE_ROWS_PER_CHUNK =
   512` (mirroring `trace::main_trace::ROW_MAJOR_CHUNK_SIZE`). Each chunk
   runs an inline Montgomery inversion into a chunk-local scratch `Vec`
   via the new `invert_denoms_in_place` helper — forward prefix-product
   pass, one real `try_inverse`, backward sweep that overwrites scratch
   with per-element inverses — then walks its rows forward maintaining a
   `num_cols`-wide running register, writing local prefix sums (starting
   from ZERO) directly into its output slice and emitting per-column
   totals.
3. *Phase 2 — sequential exclusive prefix scan* over
   `num_chunks · num_cols` chunk totals produces the per-chunk global
   offset each local prefix needs. Cheap (summary data only, no
   inversion) and the single serial-dependency point in the whole
   algorithm.
4. *Phase 3 — parallel offset add* (parallel under `concurrent`): second
   pass over the same row-chunked output slices in lockstep with
   `chunk_offsets`, adding the per-column offset into every row. Chunk 0
   is skipped (zero offset). Memory-bandwidth limited, trivially
   parallel, no allocation, no inversion.

Because partial sums are additive (not multiplicative), phases 2 and 3
collapse to a prefix scan + vector add instead of the global rescale a
product-prefix formulation would need.

**Allocation discipline**:

- Output: one `Vec<EF>` of length `(num_rows + 1) * num_cols`.
- Per-chunk scratch `Vec<EF>` sized to that chunk's fraction count
  (~1.5 K EFs ≈ 24 KiB at the default tuning, L1-resident). Allocated
  once per chunk inside the closure; rayon's work stealing amortizes
  across threads.
- `row_frac_offsets` / `chunk_totals` / `chunk_offsets`: small summary
  buffers, `usize` or `num_chunks · num_cols` EFs.
- No more `denoms: Vec<EF>` or `d_inv: Vec<EF>` intermediates, and no
  call into `batch_multiplicative_inverse` (which allocates a fresh
  `Vec` and internally parallelizes, which would nest poorly inside our
  own `par_chunks_mut`).

**Feature gating**: follows `air/src/trace/main_trace.rs::to_row_major`
exactly — `#[cfg(feature = "concurrent")]` swaps `par_chunks_mut` for
`chunks_mut`, both phase 1 and phase 3 bodies are the same. Serial
fallback is still chunked + fused + bounded-scratch, so strictly better
than the old global-Vec path even without rayon.

`accumulate_slow` is unchanged — it stays the `Vec<Vec<EF>>` reference
oracle.

**Tests** (`air/src/constraints/lookup/fractions.rs`):

- `accumulate_matches_accumulate_slow_random` — single-chunk random
  fixture (32 rows), exercises only phase 1.
- `accumulate_multi_chunk_matches_accumulate_slow` *(new)* — random
  fixture at `ACCUMULATE_ROWS_PER_CHUNK * 3 + 7` rows, deliberately
  picking a trailing short chunk so phase 2's prefix scan and phase 3's
  offset add are both exercised and any off-by-one in last-chunk bounds
  is caught. This is the key regression test — the single-chunk path
  can't cover phases 2 or 3.
- `accumulate_empty_trace` *(new)* — `num_rows = 0` returns a 1-row,
  `num_cols`-wide zero matrix without touching the inversion path.
- Shared `Lcg` / `random_fixture` / `assert_matrix_matches_slow` helpers
  factored out of the original single-chunk test.

`processor/src/trace/tests/lookup.rs::build_lookup_fractions_matches_constraint_path_oracle`
is migrated to read deltas from the matrix (`aux.values[(r + 1) * width
+ col] - aux.values[r * width + col]`) instead of the old
`aux[col][r + 1] - aux[col][r]` Vec-of-Vec indexing.

**Out of scope** (follow-up milestones from commit 64de7a9):
- Wiring the matrix output into `ProcessorAir::eval` and flipping
  `AUX_TRACE_WIDTH` to retire the legacy multiset path.

Verification: `cargo test -p miden-air --lib constraints::lookup` (8/8,
includes `accumulate_multi_chunk_matches_accumulate_slow`);
`cargo test -p miden-processor --lib trace::tests::lookup` (2/2);
`make test-fast` 2892/2892; `cargo clippy -p miden-air --all-targets`
zero warnings with and without `--features concurrent`; `make lint`
clean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wire ProcessorAir through the closure-based MidenLookupAir/LogUp lookup
argument by introducing a stateless `MidenLookupAuxBuilder` ZST and
swapping the legacy multiset `AuxTraceBuilders` out of `prove_stark`.

The new builder runs `build_lookup_fractions` + `accumulate` per call
and slices the (num_rows + 1)-row accumulator into the AuxBuilder return
shape (aux trace = first num_rows rows starting at ZERO; committed
finals = last row holding the full running sum). Periodic columns are
recomputed from `PeriodicCols::periodic_columns()` and challenges are
constructed via `LookupChallenges::new(α, β)`, mirroring the
constraint-path adapter exactly.

ProcessorAir::eval now invokes `MidenLookupAir.eval` through a
`ConstraintLookupBuilder` instead of the legacy `enforce_bus`, and
`aux_width()` / `num_aux_values()` report the new 7-column LogUp width.

Milestone B intentionally stubs three things to land the integration
plumbing without touching the open-bus algebra:

- LogUp boundary checks (when_first_row / when_last_row) are commented
  out in `ConstraintColumn::column`; the per-row transition constraint
  Δ·U − V = 0 stays live.
- `ProcessorAir::reduced_aux_values` returns the verifier identity
  (prod = ONE, sum = ZERO) — the only safe stub since the verifier
  asserts `is_identity()` on the accumulated reduction.
- `ace::reduced_aux_batch_config()` returns an empty config, collapsing
  the batched ACE circuit to `constraint_check + γ·(1−1) + γ²·0 =
  constraint_check` (verified `batch_reduced_aux_values` handles the
  empty case cleanly).

`ExecutionTrace::check_constraints` is rewired to the new builder so
the debug constraint checker stops feeding the AIR an 8-column legacy
aux trace, and `test_poseidon2_prove_verify` now skips the recursive
verify step (the recursive verifier MASM still depends on the legacy
boundary terms; restoration is the follow-up milestone). `RELATION_DIGEST`
is regenerated alongside `constraints_eval.masm` so the snapshot test
and AIR↔MASM transcript binding stay consistent.

Verified with the cross-check oracle test, full miden-air + miden-processor
lib suites, and `test_blake3_256_prove_verify` end-to-end.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rip out ~3,000 lines of legacy multiset bus / aux trace code now that
`MidenLookupAuxBuilder` and `MidenLookupAir` are wired through
`prove_stark` and `ProcessorAir::eval`.

Air side:
- Delete `air/src/constraints/{decoder,stack,range,chiplets/bus*}.rs`
  — the per-component bus constraint files. Drop `pub mod bus`
  declarations from each parent module and the orphaned
  `air/src/constraints/bus.rs` index.
- Delete `enforce_bus`, `enforce_bus_first_row`, `enforce_bus_last_row`
  from `air/src/constraints/mod.rs`.
- Delete the orphaned `air/src/trace/aux_trace.rs` (an unused
  dependency-inversion trait that never had a real implementor).
- Collapse the legacy `AUX_TRACE_*_OFFSET` / `AUX_TRACE_WIDTH = 8`
  layout in `air/src/trace/mod.rs` down to
  `AUX_TRACE_WIDTH = LOGUP_AUX_TRACE_WIDTH (= 7)`. Strip dead column-index
  constants (`P1_COL_IDX`, `P2_COL_IDX`, `P3_COL_IDX`, `B_RANGE_COL_IDX`,
  hasher `P1_COL_IDX`) from the per-component trace modules.
- Delete the legacy `AuxCols<T>` struct + `NUM_AUX_COLS` + `AUX_COL_MAP`
  + `aux_col_map` test in `air/src/constraints/columns.rs`. Remove the
  unused `F_5` / `F_6` field constants left behind by the bus deletion.

Processor side:
- Delete the four legacy aux trace builders:
  `processor/src/trace/{decoder,stack,range,chiplets}/aux_trace*`
  (decoder block stack/hash/op group tables, stack overflow, range checker,
  chiplets hash kernel / bitwise / memory / ACE / virtual table / wiring bus)
- Delete `processor/src/debug.rs` (`BusMessage` trait + `BusDebugger`
  struct, only used by the deleted aux builders) and the dead
  `AuxColumnBuilder` trait in `processor/src/trace/utils.rs`.
- Delete `AuxTraceBuilders` struct, its `build_aux_columns` impl, and
  its `AuxBuilder` trait impl from `processor/src/trace/mod.rs`. Drop
  the field, accessor, and constructor parameter from `ExecutionTrace`.
  Delete the orphaned `ExecutionTrace::build_aux_trace` helper.
- Strip the dead `AceHints` / `EvaluatedCircuitsMetadata` infrastructure
  from `processor/src/trace/chiplets/ace/mod.rs` (the wiring-bus
  metadata + divisor-construction loop only existed for the legacy
  ACE wiring bus aux builder). The ACE chiplet `fill_trace` is now a
  pure trace filler with no side-channel return, and runs in the same
  parallel rayon scope as the other chiplets.
- Delete the legacy `processor/src/trace/tests/{decoder,stack,range,
  hasher,chiplets/}` aux trace builder tests. The cross-check oracle
  test in `processor/src/trace/tests/lookup.rs` stays — it remains the
  algebraic safety net for the LogUp collection pipeline.

Verified with `cargo check --workspace --all-targets`, the cross-check
oracle test, full miden-air + miden-processor lib suites (46 + 1466
passing), and `test_blake3_256_prove_verify` end-to-end.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sweep up the leftover dead code from the multiset deletion: a duplicate
`build_op_group` helper in the decoder test surface (shadowed by a
local copy in `core_trace_fragment/tests.rs`), the unused
`build_span_with_respan_ops` and `build_trace_from_ops_with_inputs`
test helpers, and a handful of imports left behind in `parallel/tests`,
`trace/utils`, and `trace/tests/mod.rs` after the legacy aux trace
tests were removed. The aux-trace-builder comparison block in
`parallel/tests::test_run_with_fragments_matches_single_fragment` is
also dropped since `ExecutionTrace::build_aux_trace` no longer exists
and the deterministic-trace snapshot still covers the same equality.

Workspace is now warning-clean apart from the pre-existing unused
`mstream` / `pipe` op-flag getter warning in `air/src/constraints/op_flags`,
which is unrelated to this milestone.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Start re-populating the `processor/src/trace/tests/` tree that commit
974cde5 cleared alongside the legacy multiset bus. Introduces a shared
`LookupHarness` that wraps `build_lookup_fractions` + `accumulate` with
per-row delta and terminal accessors, keyed by `aux_col::*` column
constants. Tests hand-construct expected `LookupMessage` instances via
the `logup_msg` structs and assert `aux[r+1][col] - aux[r][col]` matches
`-1 / msg.encode(&challenges)` at each request row — a strictly stronger
successor to the old `b_chip[r]` step-by-step walk.

`chiplets/bitwise.rs` is the representative restoration, covering
U32and/U32xor on `CHIPLET_REQUESTS` (M3). Layer 3 (column closure)
is intentionally deferred: the `diagnostic_multi_batch_terminals` test
in `tests/lookup.rs` documents that columns M_2+5, M3, and C1 carry
open boundary contributions until the LogUp boundary wiring lands.

Air-side: re-exports `LookupMessage` + the `logup_msg` module from the
previously-internal `constraints::lookup` / `constraints::logup_msg`
paths so the processor tests can encode messages without leaking other
internals. Test helpers `build_span_with_respan_ops` and
`build_trace_from_ops_with_inputs` are restored as well since the
follow-up decoder / hasher / memory / range / sibling-table
restorations depend on them.

A `TODO(logup-overflow)` note in `trace/tests/mod.rs` points at
`git show 974cde5^:processor/src/trace/tests/stack.rs` for the
deferred stack-overflow-table test.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds the two missing buses to MidenLookupAir, completing the migration
of legacy multiset arguments to LogUp:

- M5 / BUS_STACK_OVERFLOW_TABLE — three interactions on (clk, val, prev):
  right_shift add, left_shift∧overflow remove, dyncall∧overflow remove
  (DYNCALL uses hasher_state[5] for the new overflow pointer since b1'
  is reset by the call). Matches the legacy stack overflow bus exactly.

- BUS_HASHER_PERM_LINK on the shared C3 v_wiring column — binds hasher
  controller rows to permutation sub-chiplet rows via four interactions
  on (label, state[0..12]): controller input/output (+1, label 0/1),
  perm row 0 / row 15 (-m, label 0/1) where m is read from
  PermutationCols.multiplicity. Closes the structural gap that let the
  permutation segment run independently from the controller.

Sharing v_wiring with ACE wiring is sound because the two buses use
distinct bus_prefix[bus] additive bases — their fractions are linearly
independent in the extension field and cannot interfere on the shared
running accumulator. Renames emit_ace_wiring → emit_v_wiring and bumps
the column's MAX_INTERACTIONS_PER_ROW from 3 to 4.

ChipletActiveFlags gains a permutation field (= s_perm = perm_seg) so
the perm-link emitter can gate its perm-row interactions without
reaching past the chiplet active-flag abstraction.

LOGUP_AUX_TRACE_WIDTH grows 7 → 8 (one new M5 column; the perm-link
adds no new column thanks to the v_wiring share). MAIN_COLUMN_SHAPE
grows to 5 entries; MIDEN_COLUMN_SHAPE to 8.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Restore the LogUp boundary identity check on top of the Milestone B
stateless `MidenLookupAuxBuilder` integration, so `prove_verify` round-
trips close algebraically rather than by stub. Three threads of work:

**LogUp boundary constraints.** Re-enable the `when_first_row` and
`when_transition` assertions inside
`ConstraintColumn::column` (the transition was already active; the
first-row `acc == 0` is new). The symmetric `when_last_row` (binding
`aux[N-1]` to the committed final) stays commented out — it exceeds
the degree-9 budget on M_2+5 even though it mirrors the transition
shape; soundness of the verifier-side boundary check now relies on the
committed finals being honestly derived from the aux trace, to be
tightened in a follow-up.

**`ProcessorAir::reduced_aux_values`.** Restore the real boundary
formula: `sum = Σ aux_finals[col] + total_correction` where
`total_correction` cancels the three open buses:

- `c_block_hash = +1 / encode(BLOCK_HASH_TABLE, [0, ph[0..4], 0, 0])`
  — the decoder's END on the root block emits one unmatched `remove`.
- `c_log_precompile = 1/d_initial − 1/d_final` — the LOGPRECOMPILE
  transcript chain telescopes to one unmatched remove (initial state)
  and one unmatched add (final state).
- `c_kernel_rom = −Σ 1 / d_kernel_proc_msg_i over VLPI[0]` — the
  kernel ROM chiplet emits an `add` per kernel proc init, matched by
  the verifier via variable-length public inputs.

The four helper functions (`program_hash_message`,
`transcript_messages`, `kernel_proc_message`,
`kernel_logup_correction_from_var_len`) come out of their
`#[allow(dead_code)]` wrapper and get live callers again.

**Bus closure bug fixes** (found via parallel planning sessions that
added `BUSDBG` instrumentation to the prover and a Python groupby-d
analysis):

- Memory chiplet word-index range checks (`w0`, `w1`, `w1<<2` from
  `memory/mod.rs:291`) now have consumers on M4 via
  `block_stack_and_range_logcap.rs`, closing the M1↔M4 range_check
  gap. The file merges the legacy `block_stack.rs` + `range_logcap.rs`
  emitters into one so the range-check response and request sides live
  together.
- The RESPAN request in `chiplet_requests.rs` used `addr_next - 1` for
  the hasher parent; fixed to `addr_next` directly (matches the
  controller/perm split — the next row's decoder `addr` already points
  at the continuation input row).
- MSTREAM and PIPE chiplet bus requests were missing entirely; added
  two-word memory reads/writes and dropped the `TODO` / `dead_code`
  markers on `op_flags::{mstream,pipe}`.

**Width bump + test updates.** `LOGUP_AUX_TRACE_WIDTH` is 8 (5 main +
3 chiplet); test harnesses and shape assertions in
`processor/src/trace/tests/lookup.rs`, `lookup_harness.rs`, and the
bitwise chiplet bus test get the new value imported from `miden_air`.
The `aux_col` module in `lookup_harness.rs` collapses `HASHER_PERM_LINK`
into the shared `V_WIRING` slot (col 7), matching the 2b81a86c1 decision
to ride the perm-link bus on the ACE-wiring column via distinct
`bus_prefix` bases.

**Snapshot regeneration.** `RELATION_DIGEST` in `config.rs` and
`mod.masm`, plus the insta snapshot and `constraints_eval.masm`, are
regenerated to reflect the new constraint system
(`num_eval_gates: 5520`).

Verified on `test_blake3_256_prove_verify` end-to-end (debug print
shows `residual = [0, 0]`), the full non-recursive `prove_verify` suite
(10 / 10 passing across Blake3/Keccak/Rpo/Poseidon2/Rpx + both proptests
+ fast-parallel variants), the cross-check oracle, and
`miden-air --lib` (including the snapshot test and the degree-budget
test).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@adr1anh adr1anh changed the base branch from next to adr1anh/constraint-simplification April 15, 2026 03:23
adr1anh and others added 5 commits April 15, 2026 11:51
Two small CI fixups surfaced by the `check all feature combinations`
and `Build Documentation` jobs on `adr1anh/bus`:

- `air/src/constraints/lookup/miden_air.rs::SeededRng::next_felt` used
  `miden_crypto::rand::test_utils::prng_value`, which lives behind a
  feature flag not enabled under `cargo check --all-targets
  --no-default-features --manifest-path air/Cargo.toml`. Replaced the
  call with an inline SplitMix64 mixer on `(seed, counter)` — still
  reproducible across runs, still sufficient for the cached-encoding
  equivalence test, no external dep on miden-crypto's test-utils.

- `docs/src/design/bus_constraint_inventory.md` lines 993 and 1034 had
  bare-text formulas `deg(...) <= D_max` and
  `current_column_degree + cost(...) <= D_max`. Docusaurus' MDX parser
  sees the `<` and starts looking for a JSX tag, then chokes on the
  bare `=`. Wrapped both in inline backticks so MDX treats them as
  code and ignores the `<`.

Verified with `cargo check -p miden-air --all-targets --no-default-features`
(clean) and `cargo test -p miden-air --lib miden_lookup_air_cached_encoding_equivalence`
(passes with the new PRNG).
Move the four internal design docs authored during the all-LogUp bus
refactor out of `docs/src/design/` — they're working notes, not
user-facing documentation, and Docusaurus auto-indexes the folder so
they'd end up shipped on the site if left in place:

- `bus_api_design_notes.md` — closure-based `LookupBuilder` API
  rationale and trait-stack layout sketch.
- `bus_constraint_inventory.md` — per-bus constraint audit from the
  legacy multiset model, plus the column-packing degree-budget
  worksheet that drove the `MidenLookupAir` shape.
- `bus_packing_summary.md` — distilled packing decisions per column.
- `lookup_air_plan.md` — task-by-task execution log for the refactor.

The files live on locally under
`~/.claude/projects/-Users-adrian-Developer-miden-vm/memory/pr_bus_notes/`
for future reference; this commit only removes them from the tree.

Also drops the two dangling doc-comment references to
`docs/src/design/lookup_air_plan.md` in
`air/src/constraints/lookup/{mod,builder}.rs`.
Replace the Milestone-B empty stub in `reduced_aux_batch_config` with a
real LogUp boundary identity check in the ACE circuit DAG:

    0 = Σ aux_bound[0..7] + c_block_hash + c_log_precompile + c_kernel_rom

The two fixed-length corrections (block_hash, log_precompile) are rebuilt
inside the DAG as rational fractions `(±1, d)` and folded into a running
`(N, D)` pair — no in-circuit inversion needed. The kernel-ROM correction
is computed in MASM via `ext2inv + ext2add` accumulation and handed in as
a scalar via `VlpiReduction(0)`. The final boundary check is the quadratic
identity `(Σ aux_bound + c_kr) · D + N = 0`, batched with γ against the
constraint evaluation.

Key changes:

- **air/src/ace.rs**: new `LogUpBoundaryConfig` / `batch_logup_boundary` /
  `logup_boundary_config` replacing the old `ReducedAuxBatchConfig` stub.
  Reuses existing `encode_bus_message` for in-DAG message construction.

- **crates/ace-codegen/src/circuit.rs**: encoder now appends a trailing
  `add(root, 0)` when the DAG root isn't the last emitted op, fixing a
  Rust/MASM evaluator disagreement (the ACE chiplet reads "last wire" as
  output, not a logical root pointer).

- **crates/lib/core/asm/sys/vm/public_inputs.masm**: `reduce_kernel_digests`
  changed from multiplicative product (`ext2mul`) to sum-of-reciprocals
  (`ext2inv + ext2add + ext2neg`), matching `kernel_logup_correction_from_var_len`
  in the Rust verifier.

- Regenerated `constraints_eval.masm`, `RELATION_DIGEST`, and insta snapshot.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Merge the three encoding-primitive methods (beta_powers, bus_prefix,
insert_encoded) from the separate EncodedLookupGroup trait into
LookupGroup with panicking default implementations. This simplifies the
trait hierarchy by removing the EncodedGroup GAT from LookupColumn and
using a single Group type for both closures of
group_with_cached_encoding.

The prover path now unconditionally panics on encoding-primitive calls
(previously used debug_assert which was silent in release builds).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Base automatically changed from adr1anh/constraint-simplification to next April 16, 2026 15:51
adr1anh and others added 29 commits April 23, 2026 00:17
Clarify the emitter count (seven buses + lookup_op_flags helper, not eight)
and describe emit_miden_boundary as a function rather than lumping it in
with the column-shape/final-count constants.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace "instead of legacy selector-derived labels" phrasing (which only
makes sense during the refactor) with a descriptive note pointing at the
BusId enum as the domain carrier.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace hand-written BUS_BLOCK_STACK_TABLE / BUS_RANGE_CHECK / BUS_ACE_WIRING
etc. references with the matching BusId::* enum variants, so greps land on
real code instead of retired labels.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Both MainLookupAir and ChipletLookupAir now return MIDEN_MAX_MESSAGE_WIDTH
(= 16) so every LookupAir implementor in this crate agrees with the
aggregator on the bus-prefix table width. The widest live payload is
15 slots (HasherMsg::State); the 16th slot is the existing MASM-alignment
slack kept at the aggregator level.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Only the chiplet-bus emitters and the ChipletBusContext inside
crate::constraints::lookup use this type; make its visibility match.
pub(in crate::constraints::lookup) would be tighter, but ChipletBusContext
(pub(crate)) re-exposes it, so pub(crate) is the narrowest level Rust
allows here.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
block_hash_and_op_group::{self as block_hash_and_op_group, ...} was a no-op
rename. Collapse to the sibling form used by the other emitter imports.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Both sub-AIRs have unique eval signatures at their concrete types; drop the
<MainLookupAir as LookupAir<LB>>::eval(...) qualifier in favor of method
syntax. Matches project preference for top-of-file imports over inline
qualified paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
All seven hasher response variants (sponge_start / sponge_respan / mp_verify
/ mr_update_{old,new} / hout / sout) encode their bus row at clk + 1. Hoist
the expression to a single let-binding above the column closure and clone
into each branch instead of rebuilding it seven times.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
MP_VERIFY / MR_UPDATE_OLD / MR_UPDATE_NEW inputs emit the same leaf-word
encoding differing only in (name, flag, BusId). Collapse the three
structurally identical g.add() bodies into a single for loop so future
changes to the leaf-word derivation apply in one place.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SiblingMsgBitZero and SiblingMsgBitOne encoded identical data with different
β positions; merge into a single SiblingMsg<E> tagged by a SiblingBit enum
and let encode() pick the β base from the tag.

With the single message type, the four sibling-bus call sites in hash_kernel
(MV/MU x bit0/bit1) collapse into one iteration that dispatches add/remove
on the flag. Also updates the processor hasher test harness to emit the
unified SiblingMsg.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
BitwiseResponseMsg encoded identically to BitwiseMsg against BusId::Bitwise,
only renaming the result field to z. Drop BitwiseResponseMsg, redirect the
single call site in the chiplet-responses emitter to construct a plain
BitwiseMsg, and update the processor test harness.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the 3-variant HasherMsg enum (each variant redundantly carrying kind
/ addr / node_index / payload) with a single struct that holds the kind tag
plus a HasherPayload sum over the three payload widths. encode() now reads
kind, addr, and node_index directly from the struct and only matches on
payload width, cutting three near-identical match arms down to one plus a
width-dispatched loop. Call sites in chiplet_responses update accordingly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
MemoryHeader was a 3-field struct whose only purpose was to host four
constructors that .clone()'d themselves into a MemoryMsg. Move the
constructors onto MemoryMsg as associated functions taking (ctx, addr, clk,
payload) directly and drop the header type. Call sites in chiplet_requests
/ hash_kernel / processor memory test update accordingly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the hand-maintained \`pub const COUNT: usize = 25\` with a const
expression keyed off the last variant's discriminant plus a static
assertion. Adding a new variant with a higher discriminant now bumps COUNT
automatically; a gap or a missed discriminant bump surfaces as a compile
error instead of silently inflating the bus-prefix table.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
LookupFractions::new took a builder-generic LB just to reach air.column_shape()
/ air.num_columns(). Replace with LookupFractions::from_shape(shape, num_rows)
that takes the shape slice directly and drops the LookupBuilder bound.
Callers (prover build_lookup_fractions, test fixtures) pass the shape from
the driving AIR explicitly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
split_off(num_rows * num_cols) allocated a one-row Vec just to read
last_row[0] and drop the rest. Index into data directly, copy the single
committed final EF, and truncate in place.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The constraint-path adapter's low-degree closing check
(`acc = committed_final` on the last row) is only sound when the last row
fires no bus interactions — i.e. every fraction column's last-row value is
zero. Add a `when_last_row acc[i] = 0` constraint in the fraction-column
branch of `ConstraintLookupBuilder::next_column` so the invariant is
enforced algebraically, not just assumed. Also tightens the Montgomery
invariant comment in `invert_and_scale` to state the loop invariant
explicitly.
push_sibling only reads from the MainTrace view; the ExecutionTrace handle
was a stale parameter left from an earlier iteration of the helper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…an_ops

The helper is now consumed by processor::trace::tests::chiplets::hasher::
respan_hasher_bus, so the carve-out against the dead-code lint is no
longer necessary.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
MPVERIFY emits exactly 1 HOUT add and MRUPDATE emits exactly 2 (one per
leg). The > 1 / > 2 bounds were overly loose and would let a duplicate-
emission bug slip past. Pin the expected counts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The header described per-row deltas and end-of-program closure, matching
the old running-product model. The actual harness is column-blind and
subset-based, so rewrite the doc to match. Also fix two stale intra-doc
links from the earlier LookupFractions::new -> from_shape rename.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…heck

build_lookup_fractions_on_tiny_span and build_lookup_fractions_matches_
constraint_path_oracle duplicated ~30 lines of setup, and the oracle check
already subsumes the plumbing check (if the constraint-path agreement
holds, no zero-denominator or shape-overflow bug fired in the prover
path). Keep a single test; extract the program body into a tiny_span()
helper so future changes to the canonical program land in one place.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-file import

Six helper signatures used fully-qualified miden_air::trace::MainTrace in
their parameter lists. Bring MainTrace in at the top of each test module
instead.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The sibling-table MRUPDATE bus test lived in its own top-level
`tests/hasher.rs` alongside `init_leaves` / `init_leaf` / `word_to_ints`
helpers, duplicated with the ones in `tests/chiplets/hasher.rs`. Merge
both into `tests/chiplets/hasher.rs` (one hasher test file covering all
chiplet bus scenarios) with the Merkle helpers at the end.
Pure rustfmt output; no semantic change.
- Drop stale Milestone-B block comment in prove_verify.rs.
- Remove historical "aux trace is no longer pre-built" paragraph in trace/mod.rs.
- Delete the stale `let _ = ace_start;` dead-binding in chiplets/fill_trace.
- Rewrite the `bus-debugger` feature doc to note the real-trace bus debugger
  is not yet wired into the prover/processor paths — the feature currently
  only pulls in the LookupAir shape-validation walker.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The M1/M_2+5/M3/M4/C1/C2/C3 shorthand was a scaffolding artifact from
the packing exploration and no longer maps 1:1 to a column identity
the rest of the codebase understands. Replace every remaining mention
(module headers, struct docs, const docs, emitter docstrings) with a
content-based description — "block-stack + u32rc + logpre + range-
table response", "chiplet responses", etc.

Also collapse the `num_columns` bodies in `MainLookupAir` and
`ChipletLookupAir` to `*_COLUMN_SHAPE.len()`, matching the style the
`MIDEN_COLUMN_SHAPE`-driven aggregator uses and keeping the count in
a single source of truth.

The M4 references in `constraints/chiplets/permutation/state.rs` are
the Poseidon2 4x4 matrix name and are unrelated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…or path

Three small cleanups on `air/src/lib.rs`:

1. `ProcessorAir::num_columns` returns `MIDEN_COLUMN_SHAPE.len()` rather
   than the magic `7`, keeping the count in one place.

2. `build_aux_trace` now `debug_assert_eq!`s that the generic driver
   returned exactly one real committed final before the NUM_LOGUP_-
   COMMITTED_FINALS=2 zero-pad — documents the invariant the pad relies
   on so a future shape change trips the assertion rather than silently
   pushing an extra zero onto a longer vector.

3. Drop `ReduceBoundaryBuilder::error`. Denominators are random-challenge
   LogUp prefixes (`alpha + sum(beta^i * field_i)`) that are non-zero on
   any legitimate proof with overwhelming probability, and the outer
   quotient check already rejects the degenerate case, so threading a
   fallible result through the reducer adds no safety. Replace with an
   `expect` whose message names the soundness argument.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Wire HORNERBASE (2 element reads) and HORNEREXT (1 word read) memory
  requests into the chiplet-requests LogUp bus; without these the memory
  bus is unbalanced on any program using horner_eval_base/ext.
- Add AuxBusBoundary(1) zero-constraint to the ACE LogUpBoundaryConfig
  (batched at γ² so it cannot cancel against the γ¹ boundary sum). The
  native verifier already ran a runtime zero-check; the recursive path
  now enforces the same invariant.
- Refactor messages.rs: route simple fixed-layout encoders (Bitwise,
  OpGroup, StackOverflow, AceInit, Range, LogCapacity, AceWire, KernelRom,
  HasherPermLink, BlockHash) through Challenges::encode; factor the
  MemoryResponseMsg prefix mux into two 2-way selects; align stale State/
  Rate/Word + memory-response width comments to a payload-only convention.
- Regenerate constraints_eval.masm, mod.masm, and config.rs RELATION_DIGEST;
  eval-gate count 5548 → 5580, adv_pipe repeat 768 → 772.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@huitseeker huitseeker changed the title WIP: All-LogUp auxiliary trace bus constraints All-LogUp auxiliary trace bus constraints Apr 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants