[model] fix: fix gpt oss export by yuki-97 · Pull Request #3271 · NVIDIA-NeMo/Megatron-Bridge

yuki-97 · 2026-04-10T13:59:23Z

Summary

Move the transpose of down_proj for GPT-OSS expert weights to NeMo-RL side instead of bridge side to avoid mismatch layout when saving checkpoint.

History:

[model] fix: fix gpt-oss down_proj weight handling #3162 fixed a double-transpose bug: both import and export were transposing down_proj. The import-side transpose was removed, leaving only the export transpose in GPTOSSMLPDownProjMapping.megatron_to_hf. This helps fixing Megatron [in, out] layout and vLLM [out, in] layout.
However, the intermediate [out, in] layout produced by the bridge transpose corrupted saved checkpoints (save_hf_pretrained wrote weights that didn't match the original HF [in, out] layout).
[model] fix: Use adaptive transpose_on_export for GPT-OSS expert weights #3250 papered over this with a post-export generator wrapper that un-transposed before saving — correct but "transpose then un-transpose".

Root cause: The transpose is a NeMo-RL refit concern, not a bridge concern. The bridge's megatron_to_hf should always produce standard HF layout.

Changes

Reverts [model] fix: Use adaptive transpose_on_export for GPT-OSS expert weights #3250 (transpose_on_export adaptive wrapper in GPTOSSMLPDownProjMapping).
Removes the down_proj transpose from GPTOSSMLPDownProjMapping.megatron_to_hf. The export path now returns weights in the correct HF [in, out] layout for checkpoint saving.

Testing

Validated. See validate steps and results in NVIDIA-NeMo/RL#2249.

Summary by CodeRabbit

Refactor
- Streamlined weight export logic by removing model-specific conditional handling across conversion utilities and bridge modules.
Chores
- Updated documentation to reflect simplified weight handling approach.

copy-pr-bot · 2026-04-10T13:59:28Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

…ert weights (NVIDIA-NeMo#3250)" This reverts commit b139a4b. Signed-off-by: Yuki Huang <yukih@nvidia.com>

Signed-off-by: Yuki Huang <yukih@nvidia.com>

coderabbitai · 2026-04-10T14:01:11Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: eb7299b4-ba56-44c9-8531-0607433ee0f1

📥 Commits

Reviewing files that changed from the base of the PR and between 5064fbc and c4d5eff.

📒 Files selected for processing (5)

examples/conversion/hf_megatron_roundtrip_multi_gpu.py
src/megatron/bridge/models/conversion/auto_bridge.py
src/megatron/bridge/models/conversion/utils.py
src/megatron/bridge/models/gpt_oss/gpt_oss_bridge.py
src/megatron/bridge/utils/common_utils.py

💤 Files with no reviewable changes (2)

src/megatron/bridge/utils/common_utils.py
src/megatron/bridge/models/conversion/auto_bridge.py

📝 Walkthrough

Walkthrough

The PR removes GPT-OSS-specific export transpose handling that was previously applied globally during HF safetensors weight export. This includes deleting utility functions (get_hf_model_type, fix_gpt_oss_export_transpose) and conditional transpose-fix wrapping across export utilities, while updating GPTOSSMLPDownProjMapping.megatron_to_hf to remove its transpose behavior.

Changes

Cohort / File(s)	Summary
Export utilities `src/megatron/bridge/models/conversion/auto_bridge.py`, `src/megatron/bridge/models/conversion/utils.py`	Removed imports of `fix_gpt_oss_export_transpose` and `get_hf_model_type`; deleted conditional wrapping that applied transpose fix to weight iterators during export.
Example script `examples/conversion/hf_megatron_roundtrip_multi_gpu.py`	Removed imports and conditional unwrapping logic for GPT-OSS transpose fix; weight export loop now iterates directly without intermediate transformation.
GPT-OSS model bridge `src/megatron/bridge/models/gpt_oss/gpt_oss_bridge.py`	Updated documentation for `maybe_modify_loaded_hf_weight` and `GPTOSSMLPDownProjMapping`; modified `megatron_to_hf` to remove transpose operation and always pass contiguous weights to superclass.
Common utilities `src/megatron/bridge/utils/common_utils.py`	Removed `get_hf_model_type()` function, `_GPT_OSS_TRANSPOSED_SUFFIXES` constant, and `fix_gpt_oss_export_transpose()` generator wrapper.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 3

❌ Failed checks (2 warnings, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Results For Major Changes	⚠️ Warning	PR contains major changes affecting weight export and model conversion logic, but lacks documented test results or validation evidence.	Add test results to PR description demonstrating successful weight export, numerical correctness of model conversion, and absence of regressions.
Title check	❓ Inconclusive	The title 'fix gpt oss export' is vague and does not clearly describe the specific changes made—it mentions GPT-OSS export without explaining what the fix entails (removing transpose handling, simplifying conditional logic, etc.).	Replace with a more specific title that clarifies the main change, such as 'Remove GPT-OSS export transpose handling' or 'Simplify GPT-OSS weight export by removing conditional transpose fix'.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

cuichenx · 2026-04-10T17:19:25Z

/ok to test c4d5eff

yuki-97 force-pushed the yukih/fix-gpt-oss branch from 43e3c2c to 8e048be Compare April 10, 2026 13:59

yuki-97 added 2 commits April 10, 2026 07:00

Revert "[model] fix: Use adaptive transpose_on_export for GPT-OSS exp…

6ddb3f7

…ert weights (NVIDIA-NeMo#3250)" This reverts commit b139a4b. Signed-off-by: Yuki Huang <yukih@nvidia.com>

remove transpose in down_proj

c4d5eff

Signed-off-by: Yuki Huang <yukih@nvidia.com>

yuki-97 force-pushed the yukih/fix-gpt-oss branch from 8e048be to c4d5eff Compare April 10, 2026 14:01

yuki-97 mentioned this pull request Apr 10, 2026

fix: fix gpt oss export NVIDIA-NeMo/RL#2249

Draft

yaoyu-33 added the needs-more-tests Requires additional L0 and L1 test coverage before merge label Apr 10, 2026

copy-pr-bot bot temporarily deployed to test April 10, 2026 17:20 Inactive

yaoyu-33 added the ready-to-merge PR is approved, current, and only waiting for CI to pass before merge label Apr 10, 2026

yaoyu-33 approved these changes Apr 10, 2026

View reviewed changes

yaoyu-33 enabled auto-merge (squash) April 10, 2026 17:58

cuichenx disabled auto-merge April 10, 2026 18:34

cuichenx merged commit 0182c79 into NVIDIA-NeMo:main Apr 10, 2026
72 of 73 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[model] fix: fix gpt oss export#3271

[model] fix: fix gpt oss export#3271
cuichenx merged 2 commits intoNVIDIA-NeMo:mainfrom
yuki-97:yukih/fix-gpt-oss

yuki-97 commented Apr 10, 2026 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Apr 10, 2026

Uh oh!

coderabbitai bot commented Apr 10, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (2 warnings, 1 inconclusive)

Uh oh!

cuichenx commented Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

yuki-97 commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Testing

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Apr 10, 2026

Uh oh!

coderabbitai bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (2 warnings, 1 inconclusive)

Uh oh!

cuichenx commented Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yuki-97 commented Apr 10, 2026 •

edited

Loading

coderabbitai bot commented Apr 10, 2026 •

edited

Loading