Skip to content

Add Reduce abstraction to support quantizations and modify tiled_reduce accordingly #989

@suri-kumkaran

Description

@suri-kumkaran

Add Reduce abstraction to support quantizations and modify tiled_reduce accordingly

Description

Building on the foundational tiled_reduce pipeline, this issue serves two main architectural purposes. First, it extracts the initially hardcoded reduction logic into a dedicated Reduce abstraction, which is strictly necessary to support multi-vector quantizations (handling scaling factors, offsets, and lookup tables).

Second, it improves the tiled_reduce traversal and reduction flow to better support quantized execution paths and extensible reduction strategies. The new abstraction boundary enables reduction implementations to specialize accumulation behavior, metadata handling, and intermediate storage layouts without coupling those concerns directly to the core kernel compute pipeline.

Tasks

  • Design the Reduce abstraction to decouple the similarity reduction step from the core Kernel compute, ensuring it can handle quantization metadata efficiently.
  • Refactor tiled_reduce to route reduction and accumulation logic through the new Reduce abstraction.
  • Implement reduction interfaces capable of supporting quantized accumulation paths, including scaling factors, offsets, and lookup-table based reductions where required.
  • Refactor the existing f32 and f16 implementations to utilize the new Reduce abstraction.
  • Validate that the abstraction boundary does not introduce a performance regression against the baseline AVX2 metrics.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions