Add position_ids to RoFormerForCausalLM forward pass by saivedant169 · Pull Request #44705 · huggingface/transformers

saivedant169 · 2026-03-14T16:48:06Z

Fixes part of #32937

What does this PR do?

RoFormer introduced rotary position embeddings, but its ForCausalLM forward method doesn't accept position_ids — which means callers can't specify custom positions for packed sequences or flash attention without padding.

The interesting bit is that RoFormerSinusoidalPositionalEmbedding.forward() already accepts a position_ids argument. It just was never wired up from the model's public API.

This PR threads position_ids through the full call chain:

RoFormerForCausalLM → RoFormerModel → RoFormerEncoder → RoFormerSinusoidalPositionalEmbedding

When position_ids is None, behaviour is identical to before (sequential positions are generated internally). When provided, the embedding layer uses them directly.

Shape handling during generation

GenerationMixin passes 2D position_ids of shape [batch_size, seq_len], which makes the embedding output 3D instead of the usual 2D. The encoder now checks the output dimensionality and reshapes accordingly — [1, 1, seq_len, dim] for the broadcast case, [batch, 1, seq_len, dim] when batch-specific positions are provided.

How was it tested?

Full RoFormer test suite: 77 passed, 123 skipped, 3 xfailed, 0 failures
test_for_generate_causal_lm passes (this exercises the GenerationMixin → position_ids path)
make style and make fix-repo both clean

python -m pytest tests/models/roformer/test_modeling_roformer.py -v

Coordination

Commented on #32937 here. This covers RoFormer specifically — other models missing position_ids can be addressed in separate PRs.

Threads position_ids through RoFormerForCausalLM -> RoFormerModel -> RoFormerEncoder -> RoFormerSinusoidalPositionalEmbedding, enabling explicit position control for flash attention packed sequence optimization and API consistency with other CausalLM models. Closes huggingface#32937 (for RoFormer) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-03-14T16:49:56Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: roformer

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add position_ids to RoFormerForCausalLM forward pass#44705

Add position_ids to RoFormerForCausalLM forward pass#44705
saivedant169 wants to merge 1 commit intohuggingface:mainfrom
saivedant169:fix/issue-32937-roformer-position-ids

saivedant169 commented Mar 14, 2026

Uh oh!

github-actions bot commented Mar 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

saivedant169 commented Mar 14, 2026

What does this PR do?

Shape handling during generation

How was it tested?

Coordination

Uh oh!

github-actions bot commented Mar 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant