remove stale issue #119 xfail for greedy paged-parity test by LxYuan0420 · Pull Request #149 · vllm-project/vllm-metal

LxYuan0420 · 2026-03-09T06:39:37Z

This PR is:

To remove a stale xfail on test_greedy_output_matches that was originally added for issue Metal paged-attention parity mismatch vs standard path #119.
To align test expectation with current main behavior after paged-path fixes already merged.
To keep parity tracking accurate while leaving batched behavior to its own tracking path.

Context

Issue #119 reported token mismatch parity failures between:

standard MLX KV cache path, and
Metal paged-attention path.

Since then, two key fixes landed:

Fix paged-attention KV cache dtype + size accounting (issue #119) #125 corrected paged KV cache dtype inference/fallback behavior and KV cache size accounting used by paged memory/block calculations.
[Paged KV] Inline metal kernel. deprecate hf pytorch kernel #136 replaced the HF/PyTorch kernel-bridge path with native MLX + inline Metal JIT dispatch (get_ops/nanobind), removing cross-framework bridge behavior from paged execution.

With those changes, the old greedy mismatch from #119 no longer reproduces on main, so the greedy xfail is stale.

Verification

pytest -q tests/test_metal_kernel_paged.py::TestMetalKernelPagedVsStandard::test_greedy_output_matches -s
pytest -m slow -q tests/test_metal_kernel_paged.py

Signed-off-by: Yuan Lik Xun <lxyuan0420@gmail.com>

WindChimeRan

LGTM. safe to merge. (seems like we still need @ericcurtin to merge these)

The xfail on `test_batched_decode_matches` was added for issue vllm-project#119 (B=2 batched GEMM producing different floats than B=1). The test now passes consistently on main after recent paged kernel fixes (vllm-project#146, vllm-project#151). This follows PR vllm-project#149 which removed the same stale xfail for the greedy single-request test. Signed-off-by: Qiang <qren@integralads.com>

…167) `test_metal_kernel_paged.py` re-implements vllm-metal internals (cache setup, prefill/decode orchestration, context management) to compare two paths. This scaffolding introduces additional complexity, making failures hard to attribute. Delete it and add its prompts to `test_paged_deterministic.py`, which does the same comparison end-to-end through the real vLLM stack against golden tokens. Related: #158 #149 #119 --------- Signed-off-by: ran <hzz5361@psu.edu>

Remove stale vllm-project#119 xfail for greedy parity

fe4a3c9

Signed-off-by: Yuan Lik Xun <lxyuan0420@gmail.com>

LxYuan0420 self-assigned this Mar 9, 2026

LxYuan0420 requested a review from ericcurtin March 9, 2026 06:39

WindChimeRan approved these changes Mar 9, 2026

View reviewed changes

LxYuan0420 merged commit 97a2844 into vllm-project:main Mar 11, 2026
5 checks passed

renqHIT mentioned this pull request Mar 13, 2026

remove stale xfail for batched decode paged-parity test #158

Closed

2 tasks

WindChimeRan mentioned this pull request Mar 17, 2026

[Test] Consolidate paged kernel tests into deterministic golden test #167

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

remove stale issue #119 xfail for greedy paged-parity test#149

remove stale issue #119 xfail for greedy paged-parity test#149
LxYuan0420 merged 1 commit intovllm-project:mainfrom
LxYuan0420:chore/remove-stale-issue119-xfail

LxYuan0420 commented Mar 9, 2026

Uh oh!

WindChimeRan left a comment •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

LxYuan0420 commented Mar 9, 2026

Context

Verification

Uh oh!

WindChimeRan left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

WindChimeRan left a comment •

edited

Loading