fix: fix gpt oss export by yuki-97 · Pull Request #2249 · NVIDIA-NeMo/RL

yuki-97 · 2026-04-10T14:24:03Z

Previously we will get gpt-oss model with error layout from examples/converters/convert_megatron_to_hf.py, this PR will fix it. See NVIDIA-NeMo/Megatron-Bridge#3271 for more details.

Validate Steps:

Import hf to megatron, train one step, and save the megatron ckpt.

NRL_FORCE_REBUILD_VENVS=true \
uv run python examples/run_grpo.py \
    --config examples/configs/recipes/llm/grpo-gptoss-20b-8n8g-megatron.yaml \
    grpo.max_num_steps=1 \
    policy.max_total_sequence_length=512 \
    logger.wandb_enabled=false \
    logger.tensorboard_enabled=false \
    checkpointing.enabled=True \
    checkpointing.checkpoint_dir=results/grpo-gptoss-20b-8n8g-megatron-test-export-transpose \
    checkpointing.save_period=1

Convert saved megatron ckpt to hf.

uv run --extra mcore python examples/converters/convert_megatron_to_hf.py \
    --config results/grpo-gptoss-20b-8n8g-megatron-test-export-transpose/step_1/config.yaml \
    --hf-model-name unsloth/gpt-oss-20b-BF16 \
    --megatron-ckpt-path results/grpo-gptoss-20b-8n8g-megatron-test-export-transpose/step_1/policy/weights/iter_0000000 \
    --hf-ckpt-path results/step_1_hf

Use the converted hf ckpt to train again.

uv run python examples/run_grpo.py \
    --config examples/configs/recipes/llm/grpo-gptoss-20b-8n8g-megatron.yaml \
    policy.model_name=results/step_1_hf \
    grpo.max_num_steps=1 \
    policy.max_total_sequence_length=512 \
    logger.wandb_enabled=false \
    logger.tensorboard_enabled=false \
    checkpointing.enabled=false

Results of step 3:
Before this PR:

  • Generation KL Error: 13.0520
  • Avg Reward: 0.0000

After this PR:

  • Generation KL Error: 0.0009
  • Avg Reward: 0.3960

Signed-off-by: Yuki Huang <yukih@nvidia.com>

copy-pr-bot · 2026-04-10T14:24:34Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

yuki-97 added 3 commits April 10, 2026 07:13

update test time

9a8b56a

Signed-off-by: Yuki Huang <yukih@nvidia.com>

move down_proj handle to vllm

e4d9257

Signed-off-by: Yuki Huang <yukih@nvidia.com>

[tmp] bump mbridge

6fa2609

Signed-off-by: Yuki Huang <yukih@nvidia.com>

yuki-97 mentioned this pull request Apr 10, 2026

[model] fix: fix gpt oss export NVIDIA-NeMo/Megatron-Bridge#3271

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: fix gpt oss export#2249

fix: fix gpt oss export#2249
yuki-97 wants to merge 3 commits intomainfrom
yukih/fix-gpt-oss

yuki-97 commented Apr 10, 2026 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yuki-97 commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot bot commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

yuki-97 commented Apr 10, 2026 •

edited

Loading