Skip to content

pulse: LCM-merge stream-axis dims + slope-based per-pulse for Range#2202

Open
JulienBalianSonos wants to merge 2 commits into
mainfrom
fix/pulse-lcm-merge-stream-axis
Open

pulse: LCM-merge stream-axis dims + slope-based per-pulse for Range#2202
JulienBalianSonos wants to merge 2 commits into
mainfrom
fix/pulse-lcm-merge-stream-axis

Conversation

@JulienBalianSonos
Copy link
Copy Markdown
Collaborator

@JulienBalianSonos JulienBalianSonos commented May 7, 2026

When two parallel pulse paths converge at an elementwise op, the merged tensor's stream-axis dim is the Least Common Multiple (LCM) of the inputs' per-pulse stream dims, not the NumPy-style broadcast. Two semantics get conflated otherwise:

  • Tensor shape compatibility (must match at runtime): non-stream axes use NumPy Broadcast. Equal or one is 1.
  • Pulse-divisibility (a scalar constraint on per-pulse cycle): on the stream axis, two paths with steady-state size K_a and K_b are compatible at any pulse that is a multiple of LCM(K_a, K_b).

PulseWrappingOp::pulsed_output_facts was using the typed op's output_facts for shape merging, which falls through to multi_broadcast and produces Broadcast([K_a, K_b]) on the stream axis when paths produce different per-pulse sizes (e.g. ConvTranspose with kernel > stride at steady-state vs initial-state phases). Downstream pulse-divisibility checks then bail.

Changes

  • New TDim::pulse_lcm helper (pure-integer LCM; returns None for symbolic, caller falls back).
  • PulseWrappingOp::pulsed_output_facts post-processes typed output_facts: when the stream-axis dim is a Broadcast, replace with LCM of all stream-bearing inputs' stream dims. Non-stream axes keep Broadcast semantics (correct for runtime shape compatibility).
  • Range::pulsify (pulse/src/ops/array/range.rs): replace stream_dim.substitute(symbol, pulse).to_usize() (full evaluated length, includes constant tail from upstream overlap-add) with slope-based guess_slope(symbol).0 × pulse_int (steady-state increment per pulse). The substitute form was forcing per-pulse to the kernel-state size when downstream paths were producing the steady-state size, surfacing exactly as the Broadcast(stride, kernel) we were merging.
  • TDim::guess_slope made pub (was pub(super)) so pulse op-pulsifiers outside tract-data can use it.

Module-level doc

pulse/src/lib.rs now documents the stream-axis vs non-stream-axis merge contract so future maintainers don't reintroduce the conflation.

Surfaced by

Pocket-TTS Mimi audio decode (upsample[stride=16, kernel=32] → linear → view → unbind → arange-based mask → SDPA → linear → conv). Pre-fix: failed at Pulsification requires pulse Broadcast(16, 32) to be a stride (1) multiple. Post-fix: pulsifies cleanly through optimize at pulse=1, 2, 4, 8.

Depends on

#2201 (tdim: PartialEq second-chance). Needed for the LayerNorm-after-ConvTranspose case where the GCD-factoring rewrite produces algebraically-equal-but-structurally-different canonical forms.

Tests

  • pulse_lcm_basic in tract-data.
  • test_pulse_meet_with_arange_branch_types_through in tract-pulse: minimal repro of the upsample-then-arange-mask pattern; pulsifies cleanly post-fix.
  • 132 + 34 lib tests green; 55 core-proptest-pulse tests green (no regressions in existing pad+conv, deconv, conv+conv, etc. proptests).

Known limitations / follow-ups

  • pulse_lcm returns None for symbolic operands; callers fall back to existing Broadcast behavior (still safe). A symbolic LCM is worth following up.
  • Op-specific pulsifiers (cnn::conv, deconv, fft, etc.) that compute stream-axis sizes in their own pulsify functions don't go through PulseWrappingOp; they should follow the same slope-based convention. This PR fixes Range; quick audit of others recommended.
  • A pre-existing tract bug exists in Deconv pulse runtime semantics (output diverges at chunk boundaries for kernel > stride). Exposed by tract compare --stream once the type-level fix in this PR is in. Filed as #2203.

Comment thread data/src/dim/tree.rs
Comment thread pulse/src/lib.rs Outdated
Comment thread pulse/src/model.rs Outdated
@JulienBalianSonos JulienBalianSonos requested a review from kali May 11, 2026 18:50
- Rename TDim::pulse_lcm -> lcm. The function is a pure integer LCM
  (with overflow guard and None for symbolic / non-positive operands);
  the pulse-meet-point framing belongs at the call site, not on the
  math primitive. Updates the call in stream_axis_lcm and the
  lcm_basic unit test.

- Drop the module-level doc block from pulse/src/lib.rs. The merge
  problem isn't the right anchor for a pulsification overview, and a
  proper overview is bigger than this PR's scope -- left for a
  separate doc pass.

- Move the stream-axis merge-semantics doc onto stream_axis_lcm in
  pulse/src/model.rs, where the mechanism actually lives.

- Refactor stream_axis_lcm to try_fold: removes the Vec allocation
  and the mutable accumulator; same short-circuit semantics.

Tests still green: tract-data 133, tract-pulse 34, core-proptest-pulse 55.
@JulienBalianSonos JulienBalianSonos force-pushed the fix/pulse-lcm-merge-stream-axis branch from 11fcdc4 to 854a2e0 Compare May 11, 2026 18:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants