Conversation
- Upgrade vLLM to main branch (includes Qwen3.5 model registry) - Upgrade mlx-lm (>=0.30.7) and mlx-vlm (>=0.3.12) for Qwen3.5 model implementations - Upgrade transformers (>=5.2.0) required by newer mlx-lm/mlx-vlm - Patch rope validation compatibility between vLLM and transformers 5.x (list vs set type mismatch for ignore_keys_at_rope_validation) Signed-off-by: otarkhan <osama.taha1994@gmail.com>
74224da to
8906c26
Compare
LxYuan0420
left a comment
There was a problem hiding this comment.
Nice work but few comments.
Could you split this into smaller pieces?
- Keep install.sh pinned to a known-good vLLM release/commit (no main), and make the Qwen3.5 dependency bumps opt-in (extra/flag/separate script).
- If we need the rope compatibility shim, please isolate it (compat module), add tight guards, and include a unit test demonstrating the list then set mismatch and confirming the fix.
One extra concern: this switches the installer to vllm@main + newer mlx-lm/mlx-vlm/transformers, which can drift the torch stack (we’ve previously hit torch/torchvision mismatches on macOS when vLLM bumps pulled torch 2.10).
Also, mlx-lm upgrades can include breaking API changes: we already saw CI failures from the RotatingKVCache API change (len(cache) → cache.size() in mlx-lm v0.30.7). That’s exactly why we need to be extra careful with “upgrade the stack” PRs.
https://github.com/ml-explore/mlx-lm/blob/v0.30.7/mlx_lm/models/cache.py#L494
| local vllm_repo="https://github.com/vllm-project/vllm" | ||
| local vllm_ref="main" |
There was a problem hiding this comment.
This switches installer from a pinned vLLM release to building from vllm@main. That’s not reproducible and is too risky for the default install path.
Can we keep install.sh pinned to a known-good vLLM tag/commit, and move “Qwen3.5 requires vLLM main + transformers 5.x” into an opt-in path (separate extra)?
| rm -rf "${vllm_dir}" "${vllm_dir}.tar.gz" | ||
|
|
||
| # Upgrade dependencies for newer model support (e.g., Qwen3.5) | ||
| uv pip install 'mlx-lm>=0.30.7' 'mlx-vlm>=0.3.12' 'transformers>=5.2.0' |
There was a problem hiding this comment.
Please dont force-upgrade these globally in the default installer; it will drift users’ environments and can break unrelated models.
Either make the Qwen3.5 dependency bumps opt-in (extra/flag like .[qwen35]), or add enough testing to show the new default stack is stable.
| def _patch_rope_validation_compat() -> None: | ||
| """Fix list vs set type mismatch for ignore_keys_at_rope_validation. | ||
|
|
||
| vLLM's model configs (e.g. Qwen3_5MoeTextConfig) pass | ||
| ignore_keys_at_rope_validation as a list, but transformers 5.x's | ||
| convert_rope_params_to_dict expects a set (uses ``|`` operator). | ||
| """ | ||
| try: | ||
| from transformers.modeling_rope_utils import RotaryEmbeddingConfigMixin | ||
| except ImportError: | ||
| return | ||
|
|
||
| orig = RotaryEmbeddingConfigMixin.convert_rope_params_to_dict | ||
| if getattr(orig, "_metal_patched", False): | ||
| return | ||
|
|
||
| def _patched(self, ignore_keys_at_rope_validation=None, **kwargs): | ||
| if isinstance(ignore_keys_at_rope_validation, list): | ||
| ignore_keys_at_rope_validation = set(ignore_keys_at_rope_validation) | ||
| return orig( | ||
| self, | ||
| ignore_keys_at_rope_validation=ignore_keys_at_rope_validation, | ||
| **kwargs, | ||
| ) | ||
|
|
||
| _patched._metal_patched = True | ||
| RotaryEmbeddingConfigMixin.convert_rope_params_to_dict = _patched |
There was a problem hiding this comment.
I’m wary of a global monkeypatch in init.py. If we keep this, can we (1) isolate it in a compat module, (2) guard by transformers version / feature detection, and (3) add a focused unit test proving the failure + fix? Otherwise this is a hidden global behavior change.
|
Btw, this works: # fresh env; vllm-metal requires Python 3.12 or 3.13
uv venv --python 3.13 --seed ~/.venvs/qwen35-metal
source ~/.venvs/qwen35-metal/bin/activate
# --- build core vLLM from source ---
# git clone https://github.com/vllm-project/vllm.git
# cd vllm
cd ~/vllm
# choose the core revision you want to test
# if you specifically want 0.17.0:
# git checkout v0.17.0
git fetch --tags
git checkout v0.16.1rc0
uv pip install -r requirements/cpu.txt --index-strategy unsafe-best-match
# uv pip install -r requirements/cpu-build.txt --torch-backend cpu
# uv pip install -r requirements/cpu.txt --torch-backend cpu
uv pip install setuptools_scm cmake --no-build-isolation
uv pip install -e . --no-build-isolation
# VLLM_TARGET_DEVICE=cpu uv pip install -e . --no-build-isolation
python -c "import vllm; print(vllm.__version__)"
cd ..
# --- install vllm-metal from PR #123 ---
cd
git clone https://github.com/vllm-project/vllm-metal.git
cd vllm-metal
git fetch origin pull/123/head:pr-123
git checkout pr-123
uv pip install maturin puccinialin --no-build-isolation
uv pip install -U mlx-lm mlx-vlm --no-build-isolation
uv pip install -e . --no-build-isolation
python -c "import vllm_metal; print('vllm-metal import ok')"Edited 3/10/2026: mlx-lm needs to be |
|
bump |
|
Hi @LxYuan0420, I think we can close this PR. Qwen3.5 support is already handled in #174 and #169 with a safer approach. Thanks! |
Summary
Qwen3_5MoeForConditionalGeneration,Qwen3_5ForConditionalGeneration)ignore_keys_at_rope_validationas a list, but transformers 5.x expects a set (uses the|operator). The patch coerces list→set inconvert_rope_params_to_dictand is idempotent (guarded against double-application).Test plan
vllm servewithQwen3.5-397B-A17B-4bit(MoE, multimodal) loads and serves successfullyvllm servewithQwen3-Coder-Nextloads successfully