Skip to content

fix: fix gpt oss export#2249

Draft
yuki-97 wants to merge 3 commits intomainfrom
yukih/fix-gpt-oss
Draft

fix: fix gpt oss export#2249
yuki-97 wants to merge 3 commits intomainfrom
yukih/fix-gpt-oss

Conversation

@yuki-97
Copy link
Copy Markdown
Contributor

@yuki-97 yuki-97 commented Apr 10, 2026

Previously we will get gpt-oss model with error layout from examples/converters/convert_megatron_to_hf.py, this PR will fix it. See NVIDIA-NeMo/Megatron-Bridge#3271 for more details.

Validate Steps:

  1. Import hf to megatron, train one step, and save the megatron ckpt.
    NRL_FORCE_REBUILD_VENVS=true \
    uv run python examples/run_grpo.py \
        --config examples/configs/recipes/llm/grpo-gptoss-20b-8n8g-megatron.yaml \
        grpo.max_num_steps=1 \
        policy.max_total_sequence_length=512 \
        logger.wandb_enabled=false \
        logger.tensorboard_enabled=false \
        checkpointing.enabled=True \
        checkpointing.checkpoint_dir=results/grpo-gptoss-20b-8n8g-megatron-test-export-transpose \
        checkpointing.save_period=1
    
  2. Convert saved megatron ckpt to hf.
    uv run --extra mcore python examples/converters/convert_megatron_to_hf.py \
        --config results/grpo-gptoss-20b-8n8g-megatron-test-export-transpose/step_1/config.yaml \
        --hf-model-name unsloth/gpt-oss-20b-BF16 \
        --megatron-ckpt-path results/grpo-gptoss-20b-8n8g-megatron-test-export-transpose/step_1/policy/weights/iter_0000000 \
        --hf-ckpt-path results/step_1_hf
    
  3. Use the converted hf ckpt to train again.
    uv run python examples/run_grpo.py \
        --config examples/configs/recipes/llm/grpo-gptoss-20b-8n8g-megatron.yaml \
        policy.model_name=results/step_1_hf \
        grpo.max_num_steps=1 \
        policy.max_total_sequence_length=512 \
        logger.wandb_enabled=false \
        logger.tensorboard_enabled=false \
        checkpointing.enabled=false
    

Results of step 3:
Before this PR:

  • Generation KL Error: 13.0520
  • Avg Reward: 0.0000

After this PR:

  • Generation KL Error: 0.0009
  • Avg Reward: 0.3960

yuki-97 added 3 commits April 10, 2026 07:13
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Apr 10, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant