Add position_ids to MptForCausalLM forward pass#44707
Open
saivedant169 wants to merge 1 commit intohuggingface:mainfrom
Open
Add position_ids to MptForCausalLM forward pass#44707saivedant169 wants to merge 1 commit intohuggingface:mainfrom
saivedant169 wants to merge 1 commit intohuggingface:mainfrom
Conversation
Threads position_ids through MptForCausalLM -> MptModel for API consistency with other CausalLM models. MPT uses ALiBi, so position_ids is accepted but not consumed by the attention layer — same pattern as the Bloom PR. Part of huggingface#32937
Contributor
|
[For maintainers] Suggested jobs to run (before merge) run-slow: mpt |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes part of #32937
What does this PR do?
Adds
position_idsas an explicit parameter toMptForCausalLM.forward()andMptModel.forward(), bringing MPT in line with other CausalLM models.Same rationale as the Bloom PR (#44706) — MPT uses ALiBi so
position_idsisn't consumed by the attention layer, but it should be in the forward signature rather than silently absorbed through**kwargs.Part of the series:
position_idsis actively used)How was it tested?
make stylecleanCoordination
Commented on #32937 here.