remove stale xfail for batched decode paged-parity test by renqHIT · Pull Request #158 · vllm-project/vllm-metal

renqHIT · 2026-03-13T03:38:42Z

Summary

Remove the stale xfail marker on test_batched_decode_matches in test_metal_kernel_paged.py
The test was marked xfail for issue Metal paged-attention parity mismatch vs standard path #119 (B=2 batched GEMM float divergence), but now passes consistently on main after recent paged kernel fixes ([varlen Kernel] Paged varlen flash attention for Metal [1/n] #146, [Continuous Batching] Packed prefill with cu_seq_lens for multiple requests #151)
Follows the same pattern as PR remove stale issue #119 xfail for greedy paged-parity test #149 which removed the xfail for test_greedy_output_matches

Test plan

test_batched_decode_matches passes locally on M3 Ultra (was previously xpassed)
Full test suite passes: 283 passed, 0 xpassed, 4 skipped

The xfail on `test_batched_decode_matches` was added for issue vllm-project#119 (B=2 batched GEMM producing different floats than B=1). The test now passes consistently on main after recent paged kernel fixes (vllm-project#146, vllm-project#151). This follows PR vllm-project#149 which removed the same stale xfail for the greedy single-request test. Signed-off-by: Qiang <qren@integralads.com>

renqHIT · 2026-03-13T03:56:05Z

Hi @LxYuan0420 @WindChimeRan 👋

I'm new here — excited to have found this project! I'm running a Mac Studio with M3 Ultra (256GB) and very interested in efficient local inference across different model sizes. Still getting familiar with the codebase.

This follows PR #149 — the remaining xfail for test_batched_decode_matches (issue #119) now consistently passes on main (xpassed in full test run). Likely fixed by the kernel changes in #146 or #151. Could you confirm this is safe to remove?

Thanks!

LxYuan0420

Thanks!

I tested it and on an M2 Max (mlx 0.30.6) test_batched_decode_matches still fails deterministically with token divergence, so I don’t think we can drop the xfail repo-wide yet. Could @WindChimeRan confirm whether batched parity is expected to be fixed across M1/M2 as well? If not, we should keep the xfail (or make it chip conditional) for now.

renqHIT · 2026-03-13T05:27:18Z

Thanks!

I tested it and on an M2 Max (mlx 0.30.6) test_batched_decode_matches still fails deterministically with token divergence, so I don’t think we can drop the xfail repo-wide yet. Could @WindChimeRan confirm whether batched parity is expected to be fixed across M1/M2 as well? If not, we should keep the xfail (or make it chip conditional) for now.

Thanks for testing on M2 Max @LxYuan0420. I only have M3 Ultra so couldn't catch the cross-chip difference.

Makes sense to keep the xfail for now. Happy to help with a chip-conditional approach if needed after @WindChimeRan weighs in.

My env: M3 Ultra 256GB, MLX 0.31.0, mlx_lm 0.29.1 — passes 3/3 runs.

WindChimeRan · 2026-03-13T08:10:19Z

@LxYuan0420 @renqHIT Thanks for testing. Let's keep the test as is for now. Will have mulitiple kernel PRs soon, and they may flip tests on and off very frequently.

When we have a stable varlen kernel, we can open a new issue to track on the numerical stability systematically.

…167) `test_metal_kernel_paged.py` re-implements vllm-metal internals (cache setup, prefill/decode orchestration, context management) to compare two paths. This scaffolding introduces additional complexity, making failures hard to attribute. Delete it and add its prompts to `test_paged_deterministic.py`, which does the same comparison end-to-end through the real vLLM stack against golden tokens. Related: #158 #149 #119 --------- Signed-off-by: ran <hzz5361@psu.edu>

LxYuan0420 requested changes Mar 13, 2026

View reviewed changes

WindChimeRan closed this Mar 13, 2026

This was referenced Mar 13, 2026

[RoadMap] [Paged KV] Continuous Batching + Chunked Prefilling + paged varlen flash att #148

Open

[Test] Consolidate paged kernel tests into deterministic golden test #167

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

remove stale xfail for batched decode paged-parity test#158

remove stale xfail for batched decode paged-parity test#158
renqHIT wants to merge 1 commit intovllm-project:mainfrom
renqHIT:chore/remove-batched-decode-xfail

renqHIT commented Mar 13, 2026

Uh oh!

renqHIT commented Mar 13, 2026

Uh oh!

LxYuan0420 left a comment

Uh oh!

renqHIT commented Mar 13, 2026

Uh oh!

WindChimeRan commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

renqHIT commented Mar 13, 2026

Summary

Test plan

Uh oh!

renqHIT commented Mar 13, 2026

Uh oh!

LxYuan0420 left a comment

Choose a reason for hiding this comment

Uh oh!

renqHIT commented Mar 13, 2026

Uh oh!

WindChimeRan commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants