Skip to content

Refactor gptj output tracing to use standardized decorators#44722

Open
chandan11248 wants to merge 3 commits intohuggingface:mainfrom
chandan11248:refactor/gptj-output-tracing
Open

Refactor gptj output tracing to use standardized decorators#44722
chandan11248 wants to merge 3 commits intohuggingface:mainfrom
chandan11248:refactor/gptj-output-tracing

Conversation

@chandan11248
Copy link

What does this PR do?

Migrates the GPT-J model to use the new @capture_outputs and @can_return_tuple decorators for standardized output collection, as described in #43979.

Changes

  • Added _can_record_outputs to GPTJPreTrainedModel, mapping "hidden_states"GPTJBlock and "attentions"GPTJAttention
  • Added @capture_outputs and @merge_with_config_defaults decorators to GPTJModel.forward()
  • Added @can_return_tuple decorator to GPTJForCausalLM, GPTJForSequenceClassification, and GPTJForQuestionAnswering
  • Removed output_attentions, output_hidden_states, and return_dict parameters from all forward() signatures
  • Removed manual accumulator loops (all_hidden_states, all_self_attentions) and return_dict branching from GPTJModel.forward()
  • Simplified GPTJBlock.forward() to return a plain torch.Tensor instead of a tuple
  • Cleaned up attention forward signatures to always return (attn_output, attn_weights) with a simplified type annotation

Net result: 38 insertions, 108 deletions — cleaner architecture with no manual output collection boilerplate.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline?
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Fixes #43979 (gptj model)

…tuple

Migrate the GPT-J model to use the new standardized output collection
decorators, replacing manual accumulation of hidden states and attention
weights with hook-based capturing.

Changes:
- Add `_can_record_outputs` to `GPTJPreTrainedModel` mapping hidden_states
  to GPTJBlock and attentions to GPTJAttention
- Add `@capture_outputs` and `@merge_with_config_defaults` to
  `GPTJModel.forward()`
- Add `@can_return_tuple` to all task head models (ForCausalLM,
  ForSequenceClassification, ForQuestionAnswering)
- Remove `output_attentions`, `output_hidden_states`, and `return_dict`
  parameters from all forward signatures
- Remove manual accumulator loops and return_dict branching
- Simplify GPTJBlock to return plain `torch.Tensor` instead of tuple
- Update attention forward signatures to always return
  `(attn_output, attn_weights)` without conditional logic

Resolves huggingface#43979
The CodeGenBlock is a documented copy of GPTJBlock. This syncs it
to match the updated signature after removing output_attentions
parameter and simplifying the return type to plain torch.Tensor.

Generated via `python utils/check_copies.py --fix_and_overwrite`.
The previous commit auto-synced CodeGenBlock.forward() with the
refactored GPTJBlock, but CodeGenModel still passes output_attentions
to CodeGenBlock and expects a tuple return. Since the CodeGen model
has not been refactored to use the new decorators yet, restore
CodeGenBlock's original forward() signature and remove the
'# Copied from' directive to decouple it from GPTJBlock until
CodeGen gets its own output tracing refactor.
@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: codegen, gptj

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Call to contributions: refactor output tracing in transformers

1 participant