Skip to content

feat: Add pinned memory optimizer offload for Megatron policy worker#2248

Open
snivertynv wants to merge 1 commit intomainfrom
sniverty/optimizer_offload_perf
Open

feat: Add pinned memory optimizer offload for Megatron policy worker#2248
snivertynv wants to merge 1 commit intomainfrom
sniverty/optimizer_offload_perf

Conversation

@snivertynv
Copy link
Copy Markdown

This significantly improves performance for the optimizer_offload_before_refit pass which is quite expensive in co-located/syncRL cases.

Enabled/disabled using the use_pinned_optimizer_offload setting (default=disabled). It has been set to false in a couple of grpo_math* yaml configs as an example. Added test cases for this feature in test_megatron_worker.py

What does this PR do ?

Optimizer D2H/H2D transfers used per-tensor pageable allocations, causing expensive cudaHostAlloc calls and synchronous memcpy on every step. This adds an opt-in mode (use_pinned_optimizer_offload) that coalesces all optimizer state into a single cached pinned buffer, eliminating cudaHostAlloc from the hot path and enabling non-blocking DMA transfers.

Issues

List issues that this PR closes (syntax):

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • ...

…olled using the use_pinned_optimizer_offload setting - set to false in a couple of grpo_math* yaml configs as an example. Added test cases for this feature in test_megatron_worker.py

Signed-off-by: Sriharsha Niverty <sniverty@nvidia.com>
@snivertynv snivertynv self-assigned this Apr 10, 2026
@snivertynv snivertynv requested review from a team and terrykong as code owners April 10, 2026 14:21
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Apr 10, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@snivertynv snivertynv added the CI:L2 Run doctests, unit tests, functional tests, and convergence tests label Apr 10, 2026
@snivertynv
Copy link
Copy Markdown
Author

/ok to test 13e7f01

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L2 Run doctests, unit tests, functional tests, and convergence tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant