Refactor gptj output tracing to use standardized decorators by chandan11248 · Pull Request #44722 · huggingface/transformers

chandan11248 · 2026-03-15T15:33:25Z

What does this PR do?

Migrates the GPT-J model to use the new @capture_outputs and @can_return_tuple decorators for standardized output collection, as described in #43979.

Changes

Added _can_record_outputs to GPTJPreTrainedModel, mapping "hidden_states" → GPTJBlock and "attentions" → GPTJAttention
Added @capture_outputs and @merge_with_config_defaults decorators to GPTJModel.forward()
Added @can_return_tuple decorator to GPTJForCausalLM, GPTJForSequenceClassification, and GPTJForQuestionAnswering
Removed output_attentions, output_hidden_states, and return_dict parameters from all forward() signatures
Removed manual accumulator loops (all_hidden_states, all_self_attentions) and return_dict branching from GPTJModel.forward()
Simplified GPTJBlock.forward() to return a plain torch.Tensor instead of a tuple
Cleaned up attention forward signatures to always return (attn_output, attn_weights) with a simplified type annotation

Net result: 38 insertions, 108 deletions — cleaner architecture with no manual output collection boilerplate.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Fixes #43979 (gptj model)

…tuple Migrate the GPT-J model to use the new standardized output collection decorators, replacing manual accumulation of hidden states and attention weights with hook-based capturing. Changes: - Add `_can_record_outputs` to `GPTJPreTrainedModel` mapping hidden_states to GPTJBlock and attentions to GPTJAttention - Add `@capture_outputs` and `@merge_with_config_defaults` to `GPTJModel.forward()` - Add `@can_return_tuple` to all task head models (ForCausalLM, ForSequenceClassification, ForQuestionAnswering) - Remove `output_attentions`, `output_hidden_states`, and `return_dict` parameters from all forward signatures - Remove manual accumulator loops and return_dict branching - Simplify GPTJBlock to return plain `torch.Tensor` instead of tuple - Update attention forward signatures to always return `(attn_output, attn_weights)` without conditional logic Resolves huggingface#43979

The CodeGenBlock is a documented copy of GPTJBlock. This syncs it to match the updated signature after removing output_attentions parameter and simplifying the return type to plain torch.Tensor. Generated via `python utils/check_copies.py --fix_and_overwrite`.

The previous commit auto-synced CodeGenBlock.forward() with the refactored GPTJBlock, but CodeGenModel still passes output_attentions to CodeGenBlock and expects a tuple return. Since the CodeGen model has not been refactored to use the new decorators yet, restore CodeGenBlock's original forward() signature and remove the '# Copied from' directive to decouple it from GPTJBlock until CodeGen gets its own output tracing refactor.

github-actions · 2026-03-16T06:38:08Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: codegen, gptj

chandan11248 added 3 commits March 15, 2026 21:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor gptj output tracing to use standardized decorators#44722

Refactor gptj output tracing to use standardized decorators#44722
chandan11248 wants to merge 3 commits intohuggingface:mainfrom
chandan11248:refactor/gptj-output-tracing

chandan11248 commented Mar 15, 2026

Uh oh!

github-actions bot commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

chandan11248 commented Mar 15, 2026

What does this PR do?

Changes

Before submitting

Uh oh!

github-actions bot commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant