Add Qwen3.5 model support by otarkhan · Pull Request #123 · vllm-project/vllm-metal

otarkhan · 2026-02-28T01:05:52Z

Summary

Upgrade vLLM to main branch which includes the Qwen3.5 model registry (Qwen3_5MoeForConditionalGeneration, Qwen3_5ForConditionalGeneration)
Upgrade mlx-lm (>=0.30.7) and mlx-vlm (>=0.3.12) which include Qwen3.5 and Qwen3.5-MoE model implementations
Upgrade transformers (>=5.2.0) as required by newer mlx-lm/mlx-vlm
Patch rope validation compatibility between vLLM and transformers 5.x — vLLM's model configs pass ignore_keys_at_rope_validation as a list, but transformers 5.x expects a set (uses the | operator). The patch coerces list→set in convert_rope_params_to_dict and is idempotent (guarded against double-application).

Test plan

Verified vllm serve with Qwen3.5-397B-A17B-4bit (MoE, multimodal) loads and serves successfully
Verified vllm serve with Qwen3-Coder-Next loads successfully

- Upgrade vLLM to main branch (includes Qwen3.5 model registry) - Upgrade mlx-lm (>=0.30.7) and mlx-vlm (>=0.3.12) for Qwen3.5 model implementations - Upgrade transformers (>=5.2.0) required by newer mlx-lm/mlx-vlm - Patch rope validation compatibility between vLLM and transformers 5.x (list vs set type mismatch for ignore_keys_at_rope_validation) Signed-off-by: otarkhan <osama.taha1994@gmail.com>

LxYuan0420

Nice work but few comments.

Could you split this into smaller pieces?

Keep install.sh pinned to a known-good vLLM release/commit (no main), and make the Qwen3.5 dependency bumps opt-in (extra/flag/separate script).
If we need the rope compatibility shim, please isolate it (compat module), add tight guards, and include a unit test demonstrating the list then set mismatch and confirming the fix.

One extra concern: this switches the installer to vllm@main + newer mlx-lm/mlx-vlm/transformers, which can drift the torch stack (we’ve previously hit torch/torchvision mismatches on macOS when vLLM bumps pulled torch 2.10).

Also, mlx-lm upgrades can include breaking API changes: we already saw CI failures from the RotatingKVCache API change (len(cache) → cache.size() in mlx-lm v0.30.7). That’s exactly why we need to be extra careful with “upgrade the stack” PRs.

https://github.com/ml-explore/mlx-lm/blob/v0.30.7/mlx_lm/models/cache.py#L494

LxYuan0420 · 2026-02-28T10:06:37Z

install.sh

+  local vllm_repo="https://github.com/vllm-project/vllm"
+  local vllm_ref="main"


This switches installer from a pinned vLLM release to building from vllm@main. That’s not reproducible and is too risky for the default install path.

Can we keep install.sh pinned to a known-good vLLM tag/commit, and move “Qwen3.5 requires vLLM main + transformers 5.x” into an opt-in path (separate extra)?

LxYuan0420 · 2026-02-28T10:08:15Z

install.sh

+  rm -rf "${vllm_dir}" "${vllm_dir}.tar.gz"
+
+  # Upgrade dependencies for newer model support (e.g., Qwen3.5)
+  uv pip install 'mlx-lm>=0.30.7' 'mlx-vlm>=0.3.12' 'transformers>=5.2.0'


Please dont force-upgrade these globally in the default installer; it will drift users’ environments and can break unrelated models.

Either make the Qwen3.5 dependency bumps opt-in (extra/flag like .[qwen35]), or add enough testing to show the new default stack is stable.

LxYuan0420 · 2026-02-28T10:14:19Z

vllm_metal/__init__.py

+def _patch_rope_validation_compat() -> None:
+    """Fix list vs set type mismatch for ignore_keys_at_rope_validation.
+
+    vLLM's model configs (e.g. Qwen3_5MoeTextConfig) pass
+    ignore_keys_at_rope_validation as a list, but transformers 5.x's
+    convert_rope_params_to_dict expects a set (uses ``|`` operator).
+    """
+    try:
+        from transformers.modeling_rope_utils import RotaryEmbeddingConfigMixin
+    except ImportError:
+        return
+
+    orig = RotaryEmbeddingConfigMixin.convert_rope_params_to_dict
+    if getattr(orig, "_metal_patched", False):
+        return
+
+    def _patched(self, ignore_keys_at_rope_validation=None, **kwargs):
+        if isinstance(ignore_keys_at_rope_validation, list):
+            ignore_keys_at_rope_validation = set(ignore_keys_at_rope_validation)
+        return orig(
+            self,
+            ignore_keys_at_rope_validation=ignore_keys_at_rope_validation,
+            **kwargs,
+        )
+
+    _patched._metal_patched = True
+    RotaryEmbeddingConfigMixin.convert_rope_params_to_dict = _patched


I’m wary of a global monkeypatch in init.py. If we keep this, can we (1) isolate it in a compat module, (2) guard by transformers version / feature detection, and (3) add a focused unit test proving the failure + fix? Otherwise this is a hidden global behavior change.

zhanwenchen · 2026-03-09T23:00:54Z

@otarkhan @LxYuan0420

Btw, this works:

# fresh env; vllm-metal requires Python 3.12 or 3.13
uv venv --python 3.13 --seed ~/.venvs/qwen35-metal
source ~/.venvs/qwen35-metal/bin/activate

# --- build core vLLM from source ---
# git clone https://github.com/vllm-project/vllm.git
# cd vllm
cd ~/vllm

# choose the core revision you want to test
# if you specifically want 0.17.0:
# git checkout v0.17.0
git fetch --tags
git checkout v0.16.1rc0

uv pip install -r requirements/cpu.txt --index-strategy unsafe-best-match
# uv pip install -r requirements/cpu-build.txt --torch-backend cpu
# uv pip install -r requirements/cpu.txt --torch-backend cpu
uv pip install setuptools_scm cmake --no-build-isolation
uv pip install -e . --no-build-isolation

# VLLM_TARGET_DEVICE=cpu uv pip install -e . --no-build-isolation

python -c "import vllm; print(vllm.__version__)"
cd ..

# --- install vllm-metal from PR #123 ---
cd
git clone https://github.com/vllm-project/vllm-metal.git
cd vllm-metal
git fetch origin pull/123/head:pr-123
git checkout pr-123

uv pip install maturin puccinialin --no-build-isolation
uv pip install -U mlx-lm mlx-vlm --no-build-isolation
uv pip install -e . --no-build-isolation

python -c "import vllm_metal; print('vllm-metal import ok')"

Edited 3/10/2026: mlx-lm needs to be v0.31.0 and mlx-vlm needs to be at least v0.3.12 (latest v0.4.0). Otherwise you get ValueError: Model type qwen3_5 not supported. (lmstudio-ai/mlx-engine#284 (comment))

0xClandestine · 2026-03-11T20:18:14Z

bump

ricky-chaoju · 2026-03-19T01:54:11Z

Hi @LxYuan0420,

I think we can close this PR. Qwen3.5 support is already handled in #174 and #169 with a safer approach.

Thanks!

otarkhan force-pushed the support-qwen3.5-models branch from 74224da to 8906c26 Compare February 28, 2026 01:06

LxYuan0420 requested changes Feb 28, 2026

View reviewed changes

zhanwenchen mentioned this pull request Mar 9, 2026

Bump vllm to 0.16.0 and install from wheel #134

Draft

ricky-chaoju mentioned this pull request Mar 11, 2026

Add Qwen3.5 model support via opt-in dependency extra #154

Closed

LxYuan0420 closed this Mar 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Qwen3.5 model support#123

Add Qwen3.5 model support#123
otarkhan wants to merge 1 commit intovllm-project:mainfrom
otarkhan:support-qwen3.5-models

otarkhan commented Feb 28, 2026

Uh oh!

LxYuan0420 left a comment

Uh oh!

LxYuan0420 Feb 28, 2026

Uh oh!

LxYuan0420 Feb 28, 2026

Uh oh!

LxYuan0420 Feb 28, 2026

Uh oh!

zhanwenchen commented Mar 9, 2026 •

edited

Loading

Uh oh!

0xClandestine commented Mar 11, 2026

Uh oh!

ricky-chaoju commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		local vllm_repo="https://github.com/vllm-project/vllm"
		local vllm_ref="main"

Conversation

otarkhan commented Feb 28, 2026

Summary

Test plan

Uh oh!

LxYuan0420 left a comment

Choose a reason for hiding this comment

Uh oh!

LxYuan0420 Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

LxYuan0420 Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

LxYuan0420 Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

zhanwenchen commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

0xClandestine commented Mar 11, 2026

Uh oh!

ricky-chaoju commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

zhanwenchen commented Mar 9, 2026 •

edited

Loading