Skip to content

feat: support mm_token_type_ids and 3D RoPE alignment for Qwen2/3-VL …#437

Open
Wangxiaoxiaoa wants to merge 1 commit intoalibaba:mainfrom
Wangxiaoxiaoa:feat/qwen_vl_support
Open

feat: support mm_token_type_ids and 3D RoPE alignment for Qwen2/3-VL …#437
Wangxiaoxiaoa wants to merge 1 commit intoalibaba:mainfrom
Wangxiaoxiaoa:feat/qwen_vl_support

Conversation

@Wangxiaoxiaoa
Copy link
Copy Markdown

Description

This PR introduces the missing support for mm_token_type_ids and binds the necessary 3D RoPE position generation methods for Qwen2-VL and Qwen3-VL models.
Changes:
1. Added logic in DataCollatorWithPaddingForMM to properly pad mm_token_type_ids based on the tokenizer's padding_side.
2. Added dynamic binding for get_vision_position_ids and generation logic for multimodal token types in model_providers.py to ensure correct 3D positional embeddings calculation during forward pass.

Fixes #436

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented May 8, 2026

CLA assistant check
All committers have signed the CLA.

@Wangxiaoxiaoa Wangxiaoxiaoa force-pushed the feat/qwen_vl_support branch from dec9235 to c8122a3 Compare May 8, 2026 05:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Multiple stability issues in RLVR pipeline: LoRA synchronization failure and recovery crash

2 participants