Skip to content

Add Qwen3.5 model support#123

Closed
otarkhan wants to merge 1 commit intovllm-project:mainfrom
otarkhan:support-qwen3.5-models
Closed

Add Qwen3.5 model support#123
otarkhan wants to merge 1 commit intovllm-project:mainfrom
otarkhan:support-qwen3.5-models

Conversation

@otarkhan
Copy link
Contributor

Summary

  • Upgrade vLLM to main branch which includes the Qwen3.5 model registry (Qwen3_5MoeForConditionalGeneration, Qwen3_5ForConditionalGeneration)
  • Upgrade mlx-lm (>=0.30.7) and mlx-vlm (>=0.3.12) which include Qwen3.5 and Qwen3.5-MoE model implementations
  • Upgrade transformers (>=5.2.0) as required by newer mlx-lm/mlx-vlm
  • Patch rope validation compatibility between vLLM and transformers 5.x — vLLM's model configs pass ignore_keys_at_rope_validation as a list, but transformers 5.x expects a set (uses the | operator). The patch coerces list→set in convert_rope_params_to_dict and is idempotent (guarded against double-application).

Test plan

  • Verified vllm serve with Qwen3.5-397B-A17B-4bit (MoE, multimodal) loads and serves successfully
  • Verified vllm serve with Qwen3-Coder-Next loads successfully

- Upgrade vLLM to main branch (includes Qwen3.5 model registry)
- Upgrade mlx-lm (>=0.30.7) and mlx-vlm (>=0.3.12) for Qwen3.5 model
  implementations
- Upgrade transformers (>=5.2.0) required by newer mlx-lm/mlx-vlm
- Patch rope validation compatibility between vLLM and transformers 5.x
  (list vs set type mismatch for ignore_keys_at_rope_validation)

Signed-off-by: otarkhan <osama.taha1994@gmail.com>
@otarkhan otarkhan force-pushed the support-qwen3.5-models branch from 74224da to 8906c26 Compare February 28, 2026 01:06
Copy link
Collaborator

@LxYuan0420 LxYuan0420 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work but few comments.

Could you split this into smaller pieces?

  • Keep install.sh pinned to a known-good vLLM release/commit (no main), and make the Qwen3.5 dependency bumps opt-in (extra/flag/separate script).
  • If we need the rope compatibility shim, please isolate it (compat module), add tight guards, and include a unit test demonstrating the list then set mismatch and confirming the fix.

One extra concern: this switches the installer to vllm@main + newer mlx-lm/mlx-vlm/transformers, which can drift the torch stack (we’ve previously hit torch/torchvision mismatches on macOS when vLLM bumps pulled torch 2.10).

Also, mlx-lm upgrades can include breaking API changes: we already saw CI failures from the RotatingKVCache API change (len(cache) → cache.size() in mlx-lm v0.30.7). That’s exactly why we need to be extra careful with “upgrade the stack” PRs.

https://github.com/ml-explore/mlx-lm/blob/v0.30.7/mlx_lm/models/cache.py#L494

Comment on lines +126 to +127
local vllm_repo="https://github.com/vllm-project/vllm"
local vllm_ref="main"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This switches installer from a pinned vLLM release to building from vllm@main. That’s not reproducible and is too risky for the default install path.

Can we keep install.sh pinned to a known-good vLLM tag/commit, and move “Qwen3.5 requires vLLM main + transformers 5.x” into an opt-in path (separate extra)?

rm -rf "${vllm_dir}" "${vllm_dir}.tar.gz"

# Upgrade dependencies for newer model support (e.g., Qwen3.5)
uv pip install 'mlx-lm>=0.30.7' 'mlx-vlm>=0.3.12' 'transformers>=5.2.0'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please dont force-upgrade these globally in the default installer; it will drift users’ environments and can break unrelated models.

Either make the Qwen3.5 dependency bumps opt-in (extra/flag like .[qwen35]), or add enough testing to show the new default stack is stable.

Comment on lines +107 to +133
def _patch_rope_validation_compat() -> None:
"""Fix list vs set type mismatch for ignore_keys_at_rope_validation.

vLLM's model configs (e.g. Qwen3_5MoeTextConfig) pass
ignore_keys_at_rope_validation as a list, but transformers 5.x's
convert_rope_params_to_dict expects a set (uses ``|`` operator).
"""
try:
from transformers.modeling_rope_utils import RotaryEmbeddingConfigMixin
except ImportError:
return

orig = RotaryEmbeddingConfigMixin.convert_rope_params_to_dict
if getattr(orig, "_metal_patched", False):
return

def _patched(self, ignore_keys_at_rope_validation=None, **kwargs):
if isinstance(ignore_keys_at_rope_validation, list):
ignore_keys_at_rope_validation = set(ignore_keys_at_rope_validation)
return orig(
self,
ignore_keys_at_rope_validation=ignore_keys_at_rope_validation,
**kwargs,
)

_patched._metal_patched = True
RotaryEmbeddingConfigMixin.convert_rope_params_to_dict = _patched
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m wary of a global monkeypatch in init.py. If we keep this, can we (1) isolate it in a compat module, (2) guard by transformers version / feature detection, and (3) add a focused unit test proving the failure + fix? Otherwise this is a hidden global behavior change.

@zhanwenchen
Copy link

zhanwenchen commented Mar 9, 2026

@otarkhan @LxYuan0420

Btw, this works:

# fresh env; vllm-metal requires Python 3.12 or 3.13
uv venv --python 3.13 --seed ~/.venvs/qwen35-metal
source ~/.venvs/qwen35-metal/bin/activate

# --- build core vLLM from source ---
# git clone https://github.com/vllm-project/vllm.git
# cd vllm
cd ~/vllm

# choose the core revision you want to test
# if you specifically want 0.17.0:
# git checkout v0.17.0
git fetch --tags
git checkout v0.16.1rc0

uv pip install -r requirements/cpu.txt --index-strategy unsafe-best-match
# uv pip install -r requirements/cpu-build.txt --torch-backend cpu
# uv pip install -r requirements/cpu.txt --torch-backend cpu
uv pip install setuptools_scm cmake --no-build-isolation
uv pip install -e . --no-build-isolation

# VLLM_TARGET_DEVICE=cpu uv pip install -e . --no-build-isolation

python -c "import vllm; print(vllm.__version__)"
cd ..

# --- install vllm-metal from PR #123 ---
cd
git clone https://github.com/vllm-project/vllm-metal.git
cd vllm-metal
git fetch origin pull/123/head:pr-123
git checkout pr-123

uv pip install maturin puccinialin --no-build-isolation
uv pip install -U mlx-lm mlx-vlm --no-build-isolation
uv pip install -e . --no-build-isolation

python -c "import vllm_metal; print('vllm-metal import ok')"

Edited 3/10/2026: mlx-lm needs to be v0.31.0 and mlx-vlm needs to be at least v0.3.12 (latest v0.4.0). Otherwise you get ValueError: Model type qwen3_5 not supported. (lmstudio-ai/mlx-engine#284 (comment))

@0xClandestine
Copy link

bump

@ricky-chaoju
Copy link
Contributor

Hi @LxYuan0420,

I think we can close this PR. Qwen3.5 support is already handled in #174 and #169 with a safer approach.

Thanks!

@LxYuan0420 LxYuan0420 closed this Mar 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants