Hasher chiplet redesign by Al-Kindi-0 · Pull Request #2927 · 0xMiden/miden-vm

Al-Kindi-0 · 2026-03-27T15:13:16Z

This PR bundles several closely related changes in the hasher / chiplets area:

Hasher controller/permutation split -- the hasher trace is split into a compact controller region and a separate permutation segment, enabling permutation deduplication.
Packed 16-row Poseidon2 permutation segment -- the 31-step Poseidon2 schedule is packed from 32 rows down to 16 rows per unique permutation.
Sibling table soundness fix (#2220) -- a new mrupdate_id column domain-separates sibling-table entries, preventing cross-operation sibling swapping.
Memory address range checks (#1614) -- the memory chiplet gets w0/w1 address-limb columns, with 16-bit range-check lookups routed through the wiring bus.

These changes all touch the chiplets trace layout, bus plumbing, and AIR structure, so landing them together keeps the transition coherent.

Why

1. Deduplicate repeated permutations

The old monolithic hasher consumed 32 rows per permutation request, even if the same input state appeared repeatedly.

With the new design:

the controller records each request as a 2-row (input, output) pair,
the permutation segment executes one packed 16-row cycle per unique input state,
a multiplicity counter records how many controller pairs map to the same cycle.

For M requests with U unique input states, the rough cost changes from:

old: 32M
new: 2M + pad_to_16 + 16U

This is a clear win whenever states repeat (Merkle workloads, identical MAST roots, ...).

2. Fix sibling-table soundness

The old sibling-table encoding was vulnerable to cross-operation sibling reuse. Adding mrupdate_id domain-separates entries so sibling-table balance is enforced per MRUPDATE instance, not globally across unrelated operations.

3. Add memory address decomposition checks

The memory chiplet now decomposes word addresses into two 16-bit limbs and proves the decomposition using range-check lookups. This closes an important missing piece in memory soundness while reusing the existing wiring-bus infrastructure.

Design

Hasher: two-region trace

The hasher trace is split into two contiguous regions:

Controller (perm_seg = 0)
Compact input/output row pairs, one pair per permutation request.
Permutation segment (perm_seg = 1)
One packed 16-row Poseidon2 cycle per unique input state.

A LogUp permutation-link on the shared V_WIRING auxiliary column ties controller requests to the corresponding permutation cycles.

Packed 16-row Poseidon2 schedule

The 31-step Poseidon2 schedule is packed as:

row 0: init + ext1
rows 1-3: ext2..ext4
rows 4-10: 7 × (3 packed internal rounds)
row 11: int22 + ext5
rows 12-14: ext6..ext8
row 15: boundary / final state

Packed internal rows use s0/s1/s2 as witness columns on permutation rows in order to keep constraints degree bounded. Unused witness slots are explicitly zero-constrained (out of caution) though this could be relaxed.

Column layout

Hasher: 16 -> 20

s0 s1 s2 | h0..h11 | node_index | mrupdate_id | is_boundary | direction_bit | perm_seg
   3          12          1             1             1              1             1      = 20

New / newly significant columns:

mrupdate_id -- domain separator for sibling-table entries
is_boundary -- marks first controller input / last controller output
direction_bit -- propagated Merkle routing bit on controller rows
perm_seg -- explicit controller vs permutation-region flag

Memory: 15 -> 17

Two new columns:

w0
w1

These decompose the word address into 16-bit limbs. The wiring bus carries the corresponding range-check lookups.

Constraints

Hasher constraints now total 100.

Constraint group breakdown

Group	Count	Purpose
Selector booleanity	3	`s0,s1,s2` binary on controller rows
Perm segment	7	`perm_seg` confinement, booleanity, monotonicity, cycle alignment
Structural	7	Confine `is_boundary` / `direction_bit` to valid row types
Lifecycle	2	Operation lifecycle invariants
Controller adjacency	2	Input row must be followed by output row
Controller pairing	4	First-row constraint, output non-adjacency, padding stability
Perm witness-shape	3	Zero witness slots when unused
Perm init+ext	12	Row 0 packed transition
Perm external	12	External-round transitions
Perm packed internal	15	3 witness checks + 12 next-state constraints
Perm int+ext	13	1 witness check + 12 next-state constraints
MRUPDATE ID	2	Increment / zero-on-perm rules
Sponge capacity	4	Preserve capacity across continuations
Output index	1	Output-row `node_index` rule
Merkle index	4	Index decomposition / continuity / direction bit
Merkle input state	4	Zero capacity on Merkle input rows
Merkle routing	5	Route digest into correct rate half
Total	100

Trace width impact

Chiplet	Before	After	Delta
Hasher	16	20	+4
Memory	15	17	+2
Net main trace impact			+1

The new main trace width is 72

No new auxiliary columns were added:

the permutation-link bus shares V_WIRING
memory address range checks also use the existing wiring-bus path

Al-Kindi-0 · 2026-03-27T15:18:53Z

To compare against the numbers in #2869 for the recursive verifier (verifying a program executing in 2^20 cycles)

  ┌────────────────────────────┬──────────────┬──────────────┬───────────┬──────────┐
  │         Component          │   Old        │   New.       │  Change   │ Savings  │
  ├────────────────────────────┼──────────────┼──────────────┼───────────┼──────────┤
  │ Core trace (decoder+stack) │ 41,652       │ 41,516       │ -136      │          │
  ├────────────────────────────┼──────────────┼──────────────┼───────────┼──────────┤
  │ Range checker              │ 5,129        │ 5,217        │ +88       │          │
  ├────────────────────────────┼──────────────┼──────────────┼───────────┼──────────┤
  │ Chiplets total             │ 273,769      │ 118,657      │ -155,112  │ -57%     │
  ├────────────────────────────┼──────────────┼──────────────┼───────────┼──────────┤
  │ - Hasher                   │ 250,816      │ 96,256       │ -154,560  │ -62%     │
  ├────────────────────────────┼──────────────┼──────────────┼───────────┼──────────┤
  │ - Bitwise                  │ 3,104        │ 3,104        │ 0         │          │
  ├────────────────────────────┼──────────────┼──────────────┼───────────┼──────────┤
  │ - Memory                   │ 13,758       │ 13,406       │ -352      │          │
  ├────────────────────────────┼──────────────┼──────────────┼───────────┼──────────┤
  │ - ACE                      │ 6,090        │ 5,890        │ -200      │          │
  ├────────────────────────────┼──────────────┼──────────────┼───────────┼──────────┤
  │ - Kernel ROM               │ 0            │ 0            │ 0         │          │
  ├────────────────────────────┼──────────────┼──────────────┼───────────┼──────────┤
  │ Padded trace length        │524,288 (2^19)│131,072 (2^17)│           │ -4x      │
  ├────────────────────────────┼──────────────┼──────────────┼───────────┼──────────┤
  │ Padding                    │ 47%          │ 9%           │           │          │
  └────────────────────────────┴──────────────┴──────────────┴───────────┴──────────┘

This is a 4x improvement in the above case.
(Note that the changes in the decoder+stack number of rows is due to a change in the constraints which affects ACE circuit loading)

huitseeker · 2026-03-30T13:14:36Z

+    let w1: AB::Expr = local.chiplets[MEMORY_WORD_ADDR_HI_COL_IDX - CHIPLETS_OFFSET].clone().into();
+    let w1_mul4: AB::Expr = w1.clone() * AB::Expr::from_u16(4);
+
+    let den0: AB::ExprEF = alpha.clone() + Into::<AB::ExprEF>::into(w0);


Should this add protocol-level domain separation before v_wiring can safely carry ACE wires, raw memory range-check values, and the new hasher perm-link messages together? Right now the memory side uses plain alpha + w0/w1/4*w1, ACE uses encode([clk, ctx, id, ...]), and the perm-link uses encode([0|1, h0..h11]) on the same LogUp column.

If any of those encodings was to alias, could one subsystem cancel another on the shared sum? #1614 explicitly called out adding an op-label when reusing the wiring bus, and I don't see that namespace implemented here yet.

Nashtare

I would need to do another pass because this is pretty dense, but left a couple commetns while familiarizing myself with it

adr1anh

This looks good to me, though a proper review will happen as @Nashtare and I finalize the constraint refactoring and logup transtion.

…esign

Al-Kindi-0 · 2026-04-06T13:17:29Z

I am merging this so we can conclude #2856 and #2962 and proceed with the multi-table migration.

I will comment in the referenced PRs with my take on the approaches there, but broadly: we should avoid changes that are not clear wins or do not have major perf improvements.

There are many opportunities for factoring out and centralizing computations, but it is plausible that once we have automated witness generation for the auxiliary trace, these optimizations could be handled on the backend side. It may therefore make more sense to prioritize changes that improve auditability, readability, and soundness (e.g., domain separation). The same applies to constraints — particularly auxiliary constraints — where simplifications from the unified bus architecture would create easier optimization opportunities on the backend side later.

In other words, we should prioritize reaching the multi-table migration as soon as possible. The work in the referenced PRs should take this into account.

Each PR from here through the multi-table milestone should justify its existence with this goal in mind:

Performance work is likely lower priority given that things will change soon. If pursued, it should include benchmarks demonstrating the gain — especially if readability is compromised.
Readability/auditability work should be discussed and consensus reached, unless the changes are clearly and objectively superior. Changes that add abstraction layers without materially improving understanding for someone familiar with the constraint system do not meet this bar.

Incorporate the hasher chiplet redesign (#2927) into the constraint simplification branch. The hasher now uses a 16-row packed cycle with a controller/permutation split architecture, replacing the previous 32-row cycle. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Al-Kindi-0 added 10 commits March 27, 2026 19:15

wip: all tests passing but ace-codegen needs updating

c6b1e95

trim down ace-codegen crate

a3c2450

minor fixes and cleanups

055a394

address feedback

ac59e22

wip: all tests passing but ace-codegen needs updating

713c132

wip: initial simple design

3c6d224

wip: packing permutation rows

589a760

Improve and harden constraints and their description

1ec225c

minor updates to hasher.md

3be55b5

fix post rebase issues

77773d9

Al-Kindi-0 force-pushed the al-hasher-chiplet-redesign branch from da140ac to 77773d9 Compare March 27, 2026 15:19

Al-Kindi-0 changed the title ~~Al hasher chiplet redesign~~ Hasher chiplet redesign Mar 27, 2026

adr1anh self-requested a review March 28, 2026 09:28

adr1anh mentioned this pull request Mar 29, 2026

fix(air): enforce word_addr ∈ [0, 2³²) in the memory chiplet AIR (soundness fix) #2935

Closed

12 tasks

Al-Kindi-0 added 2 commits March 30, 2026 16:16

add sum propagating constraint w_bus

0569c70

update design docs

15dae86

huitseeker reviewed Mar 30, 2026

View reviewed changes

Nashtare reviewed Apr 1, 2026

View reviewed changes

adr1anh approved these changes Apr 6, 2026

View reviewed changes

bobbinth reviewed Apr 6, 2026

View reviewed changes

Comment thread docs/src/design/chiplets/index.md

Al-Kindi-0 mentioned this pull request Apr 6, 2026

Update chiplets diagram to reflect controller/perm split architecture #2967

Open

Al-Kindi-0 added 3 commits April 6, 2026 15:43

address comments

a1856d1

Merge remote-tracking branch 'origin/next' into al-hasher-chiplet-red…

c6e46da

…esign

add changelog

47ed3e0

Al-Kindi-0 merged commit b7255bb into next Apr 6, 2026
18 checks passed

Al-Kindi-0 deleted the al-hasher-chiplet-redesign branch April 6, 2026 12:51

Al-Kindi-0 mentioned this pull request Apr 8, 2026

Element-addressable memory follow-up #1614

Closed

3 tasks

Al-Kindi-0 mentioned this pull request Apr 8, 2026

Constraints: Investigate alternative for hasher sibling table #2220

Closed

adr1anh mentioned this pull request Apr 14, 2026

refactor(air): remove tagging, apply uniform constraint description, and optimize evaluation #2856

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hasher chiplet redesign#2927

Hasher chiplet redesign#2927
Al-Kindi-0 merged 15 commits intonextfrom
al-hasher-chiplet-redesign

Al-Kindi-0 commented Mar 27, 2026

Uh oh!

Al-Kindi-0 commented Mar 27, 2026

Uh oh!

huitseeker Mar 30, 2026

Uh oh!

Nashtare left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

adr1anh left a comment

Uh oh!

Uh oh!

Uh oh!

Al-Kindi-0 commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

Al-Kindi-0 commented Mar 27, 2026

Why

1. Deduplicate repeated permutations

2. Fix sibling-table soundness

3. Add memory address decomposition checks

Design

Hasher: two-region trace

Packed 16-row Poseidon2 schedule

Column layout

Hasher: 16 -> 20

Memory: 15 -> 17

Constraints

Trace width impact

Uh oh!

Al-Kindi-0 commented Mar 27, 2026

Uh oh!

huitseeker Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Nashtare left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

adr1anh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Al-Kindi-0 commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants