Skip to content

Add position_ids to RoFormerForCausalLM forward pass#44705

Open
saivedant169 wants to merge 1 commit intohuggingface:mainfrom
saivedant169:fix/issue-32937-roformer-position-ids
Open

Add position_ids to RoFormerForCausalLM forward pass#44705
saivedant169 wants to merge 1 commit intohuggingface:mainfrom
saivedant169:fix/issue-32937-roformer-position-ids

Conversation

@saivedant169
Copy link

Fixes part of #32937

What does this PR do?

RoFormer introduced rotary position embeddings, but its ForCausalLM forward method doesn't accept position_ids — which means callers can't specify custom positions for packed sequences or flash attention without padding.

The interesting bit is that RoFormerSinusoidalPositionalEmbedding.forward() already accepts a position_ids argument. It just was never wired up from the model's public API.

This PR threads position_ids through the full call chain:

RoFormerForCausalLMRoFormerModelRoFormerEncoderRoFormerSinusoidalPositionalEmbedding

When position_ids is None, behaviour is identical to before (sequential positions are generated internally). When provided, the embedding layer uses them directly.

Shape handling during generation

GenerationMixin passes 2D position_ids of shape [batch_size, seq_len], which makes the embedding output 3D instead of the usual 2D. The encoder now checks the output dimensionality and reshapes accordingly — [1, 1, seq_len, dim] for the broadcast case, [batch, 1, seq_len, dim] when batch-specific positions are provided.

How was it tested?

  • Full RoFormer test suite: 77 passed, 123 skipped, 3 xfailed, 0 failures
  • test_for_generate_causal_lm passes (this exercises the GenerationMixinposition_ids path)
  • make style and make fix-repo both clean
python -m pytest tests/models/roformer/test_modeling_roformer.py -v

Coordination

Commented on #32937 here. This covers RoFormer specifically — other models missing position_ids can be addressed in separate PRs.

Threads position_ids through RoFormerForCausalLM -> RoFormerModel ->
RoFormerEncoder -> RoFormerSinusoidalPositionalEmbedding, enabling
explicit position control for flash attention packed sequence
optimization and API consistency with other CausalLM models.

Closes huggingface#32937 (for RoFormer)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: roformer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant