Skip to content

[model] fix: fix gpt oss export#3271

Merged
cuichenx merged 2 commits intoNVIDIA-NeMo:mainfrom
yuki-97:yukih/fix-gpt-oss
Apr 10, 2026
Merged

[model] fix: fix gpt oss export#3271
cuichenx merged 2 commits intoNVIDIA-NeMo:mainfrom
yuki-97:yukih/fix-gpt-oss

Conversation

@yuki-97
Copy link
Copy Markdown
Contributor

@yuki-97 yuki-97 commented Apr 10, 2026

Summary

Move the transpose of down_proj for GPT-OSS expert weights to NeMo-RL side instead of bridge side to avoid mismatch layout when saving checkpoint.

History:

  • [model] fix: fix gpt-oss down_proj weight handling #3162 fixed a double-transpose bug: both import and export were transposing down_proj. The import-side transpose was removed, leaving only the export transpose in GPTOSSMLPDownProjMapping.megatron_to_hf. This helps fixing Megatron [in, out] layout and vLLM [out, in] layout.
  • However, the intermediate [out, in] layout produced by the bridge transpose corrupted saved checkpoints (save_hf_pretrained wrote weights that didn't match the original HF [in, out] layout).
  • [model] fix: Use adaptive transpose_on_export for GPT-OSS expert weights #3250 papered over this with a post-export generator wrapper that un-transposed before saving — correct but "transpose then un-transpose".

Root cause: The transpose is a NeMo-RL refit concern, not a bridge concern. The bridge's megatron_to_hf should always produce standard HF layout.

Changes

Testing

Validated. See validate steps and results in NVIDIA-NeMo/RL#2249.

Summary by CodeRabbit

  • Refactor

    • Streamlined weight export logic by removing model-specific conditional handling across conversion utilities and bridge modules.
  • Chores

    • Updated documentation to reflect simplified weight handling approach.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Apr 10, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@yuki-97 yuki-97 force-pushed the yukih/fix-gpt-oss branch from 43e3c2c to 8e048be Compare April 10, 2026 13:59
yuki-97 added 2 commits April 10, 2026 07:00
…ert weights (NVIDIA-NeMo#3250)"

This reverts commit b139a4b.

Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
@yuki-97 yuki-97 force-pushed the yukih/fix-gpt-oss branch from 8e048be to c4d5eff Compare April 10, 2026 14:01
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 10, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: eb7299b4-ba56-44c9-8531-0607433ee0f1

📥 Commits

Reviewing files that changed from the base of the PR and between 5064fbc and c4d5eff.

📒 Files selected for processing (5)
  • examples/conversion/hf_megatron_roundtrip_multi_gpu.py
  • src/megatron/bridge/models/conversion/auto_bridge.py
  • src/megatron/bridge/models/conversion/utils.py
  • src/megatron/bridge/models/gpt_oss/gpt_oss_bridge.py
  • src/megatron/bridge/utils/common_utils.py
💤 Files with no reviewable changes (2)
  • src/megatron/bridge/utils/common_utils.py
  • src/megatron/bridge/models/conversion/auto_bridge.py

📝 Walkthrough

Walkthrough

The PR removes GPT-OSS-specific export transpose handling that was previously applied globally during HF safetensors weight export. This includes deleting utility functions (get_hf_model_type, fix_gpt_oss_export_transpose) and conditional transpose-fix wrapping across export utilities, while updating GPTOSSMLPDownProjMapping.megatron_to_hf to remove its transpose behavior.

Changes

Cohort / File(s) Summary
Export utilities
src/megatron/bridge/models/conversion/auto_bridge.py, src/megatron/bridge/models/conversion/utils.py
Removed imports of fix_gpt_oss_export_transpose and get_hf_model_type; deleted conditional wrapping that applied transpose fix to weight iterators during export.
Example script
examples/conversion/hf_megatron_roundtrip_multi_gpu.py
Removed imports and conditional unwrapping logic for GPT-OSS transpose fix; weight export loop now iterates directly without intermediate transformation.
GPT-OSS model bridge
src/megatron/bridge/models/gpt_oss/gpt_oss_bridge.py
Updated documentation for maybe_modify_loaded_hf_weight and GPTOSSMLPDownProjMapping; modified megatron_to_hf to remove transpose operation and always pass contiguous weights to superclass.
Common utilities
src/megatron/bridge/utils/common_utils.py
Removed get_hf_model_type() function, _GPT_OSS_TRANSPOSED_SUFFIXES constant, and fix_gpt_oss_export_transpose() generator wrapper.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 3

❌ Failed checks (2 warnings, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Results For Major Changes ⚠️ Warning PR contains major changes affecting weight export and model conversion logic, but lacks documented test results or validation evidence. Add test results to PR description demonstrating successful weight export, numerical correctness of model conversion, and absence of regressions.
Title check ❓ Inconclusive The title 'fix gpt oss export' is vague and does not clearly describe the specific changes made—it mentions GPT-OSS export without explaining what the fix entails (removing transpose handling, simplifying conditional logic, etc.). Replace with a more specific title that clarifies the main change, such as 'Remove GPT-OSS export transpose handling' or 'Simplify GPT-OSS weight export by removing conditional transpose fix'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@yaoyu-33 yaoyu-33 added the needs-more-tests Requires additional L0 and L1 test coverage before merge label Apr 10, 2026
@cuichenx
Copy link
Copy Markdown
Contributor

/ok to test c4d5eff

@yaoyu-33 yaoyu-33 added the ready-to-merge PR is approved, current, and only waiting for CI to pass before merge label Apr 10, 2026
@yaoyu-33 yaoyu-33 enabled auto-merge (squash) April 10, 2026 17:58
@cuichenx cuichenx disabled auto-merge April 10, 2026 18:34
@cuichenx cuichenx merged commit 0182c79 into NVIDIA-NeMo:main Apr 10, 2026
72 of 73 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-more-tests Requires additional L0 and L1 test coverage before merge ready-to-merge PR is approved, current, and only waiting for CI to pass before merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants