Add Reduce abstraction to support quantizations and modify tiled_reduce accordingly
Description
Building on the foundational tiled_reduce pipeline, this issue serves two main architectural purposes. First, it extracts the initially hardcoded reduction logic into a dedicated Reduce abstraction, which is strictly necessary to support multi-vector quantizations (handling scaling factors, offsets, and lookup tables).
Second, it improves the tiled_reduce traversal and reduction flow to better support quantized execution paths and extensible reduction strategies. The new abstraction boundary enables reduction implementations to specialize accumulation behavior, metadata handling, and intermediate storage layouts without coupling those concerns directly to the core kernel compute pipeline.
Tasks
Add
Reduceabstraction to support quantizations and modifytiled_reduceaccordinglyDescription
Building on the foundational
tiled_reducepipeline, this issue serves two main architectural purposes. First, it extracts the initially hardcoded reduction logic into a dedicatedReduceabstraction, which is strictly necessary to support multi-vector quantizations (handling scaling factors, offsets, and lookup tables).Second, it improves the
tiled_reducetraversal and reduction flow to better support quantized execution paths and extensible reduction strategies. The new abstraction boundary enables reduction implementations to specialize accumulation behavior, metadata handling, and intermediate storage layouts without coupling those concerns directly to the core kernel compute pipeline.Tasks
Reduceabstraction to decouple the similarity reduction step from the coreKernelcompute, ensuring it can handle quantization metadata efficiently.tiled_reduceto route reduction and accumulation logic through the newReduceabstraction.f32andf16implementations to utilize the newReduceabstraction.