Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
6945 commits
Select commit Hold shift + click to select a range
b359981
Fix default cuda graph persist arg. Add persist to rl common.sh. (#3584)
yobibyte Feb 25, 2026
60a25aa
Optimize away add request overheads in dummy ep cuda-graphed forward …
sidsingh-nvidia Feb 25, 2026
e7d80bd
ci: Test docs build (#3583)
ko3n1g Feb 25, 2026
80ef2ae
ci: Fix docs build for release (#3597)
ko3n1g Feb 26, 2026
2561a59
ci: Remove secrets (#3598)
ko3n1g Feb 26, 2026
19444b9
ci: Define secrets (#3599)
ko3n1g Feb 26, 2026
a08ad25
ci: gh-release-from-tag (#3600)
ko3n1g Feb 26, 2026
ac449c4
Ko3n1g/ci/remove twine username (#3601)
ko3n1g Feb 26, 2026
9088d4f
Add training code to MCore wheel (#3573)
maanug-nv Feb 26, 2026
dd4c0b7
FP8 attention knob for nvFP4 recipe (#3363)
vasunvidia Feb 26, 2026
c2782fd
Fix error with --load-main-params-from-ckpt (#3569)
guyueh1 Feb 26, 2026
c921891
ci: Create comment (#3610)
ko3n1g Feb 26, 2026
9100119
ci: Skip cleanup-taint-node jobs during deployments (#3612)
ko3n1g Feb 26, 2026
6161f7a
ci: No comment for release workflow (#3615)
ko3n1g Feb 26, 2026
36d8a9d
ci: Re-add release tag prefix (#3619)
ko3n1g Feb 26, 2026
1b1f5c4
docs: Fix version picker urls (#3621)
chtruong814 Feb 26, 2026
027e0f3
ci: Increase changelog generation max PRs fetched (#3620)
chtruong814 Feb 26, 2026
18c94d2
Add debug info to an assert. (#3588)
yobibyte Feb 26, 2026
e1a9ac9
fix: async_utils: explicit GC in persistent checkpoint worker loop (#…
sbak5 Feb 26, 2026
5f668c1
Fix: Perform sigmoid calculation in fp32 for aux loss stability (#2765)
CodersAcademy006 Feb 26, 2026
d3c10df
fix: forward use_te_activation_func flag in non-MoE GPT layer spec (#…
saakshigupta2002 Feb 26, 2026
7e3f670
Multimodal: Limit transformer version in Dockerfile (#3448)
faradawn Feb 26, 2026
6c0b9c6
Track and plot per-token off-policy in RL (#3515)
tdene Feb 26, 2026
a7c207f
Revert "Add single-process checkpoint save to avoid forked multiproce…
ko3n1g Feb 26, 2026
6503bf8
Multimodal: fix VQA dataset selection (#3464)
faradawn Feb 26, 2026
2f549e5
Multimodal: Fix multimodal training example - tokenizer, Triton Cache…
faradawn Feb 26, 2026
afbce84
Support TP > GQA for inference (#3627)
santhnm2 Feb 26, 2026
310082a
μP: Maximal Update Parameterization (#3058)
plugyawn Feb 26, 2026
7418b1b
Add flexible virtual pipeline parallel (fVPP) to hybrid model (#3377)
duncanriach Feb 26, 2026
53c5973
Explicitly close and join Pool in preprocess_data.py (#3592)
weijiac0619 Feb 27, 2026
36a95ae
remove indexer (#3416)
dimapihtar Feb 27, 2026
f0519b7
Multimodal: add load weights only (#3452)
faradawn Feb 27, 2026
61a293d
Add single-process checkpoint save to avoid forked multiprocessing (#…
sbak5 Feb 27, 2026
6287e7f
Update oncall schedule (#3632)
Phlip79 Feb 27, 2026
93d2739
Update copy-pr-bot.yaml [skip ci]
github-actions[bot] Feb 28, 2026
e8fe068
M-FSDP: Cancel erroneous grad accumulation check (#3629)
shjwudp Mar 2, 2026
107f6ae
chore(beep boop 🤖): Bump (main) (2026-03-02)
github-actions[bot] Mar 2, 2026
c7be214
Fix MoE aux loss tracker hang with MTP enabled (#3401)
Victarry Mar 2, 2026
044f1e3
Fix test data preparation (#3652)
janEbert Mar 2, 2026
2f8c9bc
Add GPTOSS Example with Megatron-LM + Megatron Bridge (#3018)
faradawn Mar 2, 2026
63cd60b
Add thd unit test main (#3617)
kunlunl Mar 2, 2026
c9312e6
Inference | KV prefix caching. (#3063)
lmcafee-nvidia Mar 2, 2026
b969f76
[Megatron-FSDP] Add dtype customization to Megatron-FSDP. (#3067)
cspades Mar 2, 2026
0810e63
CachedMetadataFileSystemReader: shared cache (#3326)
sbak5 Mar 2, 2026
7d1c016
Inference Optimized MoEs (#3496)
sidsingh-nvidia Mar 3, 2026
98495af
Log torch_memory_saver offload/onload (#3567)
tdene Mar 3, 2026
6fc7690
Prefix caching | Mamba memory only. (#3657)
lmcafee-nvidia Mar 3, 2026
9b18de4
Prefix caching | Coordinator scheduling. (#3665)
lmcafee-nvidia Mar 3, 2026
7dee32a
Adding manual Claude reviewer (#3679)
Phlip79 Mar 3, 2026
2570947
Nemo-RL Refit (#3520)
wdykas Mar 3, 2026
fa93d79
Add extra permissions and make other changes (#3683)
Phlip79 Mar 4, 2026
77a00ec
Claude should always comment something (#3685)
Phlip79 Mar 4, 2026
2caa681
[Cleanup] Remove the deprecated GroupedMLP (#3410)
dimapihtar Mar 4, 2026
470c6ea
chore: rotate oncall schedule
github-actions[bot] Mar 4, 2026
2a931a3
Fix illegal memory access with mamba inference (#3631)
tdene Mar 4, 2026
bb31e93
Fix illegal memory access with mamba inference (bis) (#3696)
tdene Mar 4, 2026
4fa9b5a
remove duplicate rerun_state_machine.set_mode(rerun_mode) (#3279)
YangWang92 Mar 4, 2026
d3a8584
Correct indexing when cp_comms_type is a list (#3389)
jeromeku Mar 4, 2026
bfd160b
Fix optional chat_completions returnables (#3519)
tdene Mar 4, 2026
f84e84e
ci: Claude code review (#3704)
ko3n1g Mar 4, 2026
bd1406f
ci: Fix event payload (#3705)
ko3n1g Mar 4, 2026
33476ff
ci: Use issue number (#3706)
ko3n1g Mar 4, 2026
fd21af4
ci: Finalize Claude review (#3707)
ko3n1g Mar 4, 2026
b1b4df6
ci: Add codecov yml (#3455)
thomasdhc Mar 4, 2026
7ea354b
Robust signaling for coordinator inference (#3563)
tdene Mar 4, 2026
f91c4bb
Fix memory issue in mxfp8 model init (#3461)
WanZzzzzz Mar 4, 2026
c5d8f6b
Update copy-pr-bot.yaml [skip ci]
github-actions[bot] Mar 5, 2026
fa6063d
adding public_docs_features: True to get proper legal footer… (#3681)
megnvidia Mar 4, 2026
a2381d8
add --overlap-param-gather support for layer-wise optimizer. lots of …
mchrzanowski Mar 5, 2026
657d33b
ci: Mount and enforce HF_HOME (#3700)
ko3n1g Mar 5, 2026
da47e64
Add flags for changing Mamba inference state tensor dtype (#3660)
santhnm2 Mar 5, 2026
94a903b
chore: CLI launch internal CI (#3695)
ko3n1g Mar 5, 2026
485428f
Change Review Process (#3659)
Phlip79 Mar 5, 2026
d1cce0c
ci: Separate queues for internal/external contributors (#3718)
ko3n1g Mar 5, 2026
6bd5c12
Update to correct token (#3724)
Phlip79 Mar 5, 2026
f5b2ec0
build: Bump to NGC PyTorch 26.02 (#3474)
ko3n1g Mar 5, 2026
bb84493
Claude: use Opus 4.6 and auto-review on ready (#3727)
Phlip79 Mar 6, 2026
41daf81
Claude to add complexity label (#3709)
Phlip79 Mar 6, 2026
0d42bc6
Offload Flask frontend to separate process (#3648)
santhnm2 Mar 6, 2026
d3528a2
fix(moe): fix TE general_gemm API change (#3582)
hxbai Mar 6, 2026
43df309
Review process fixes (#3728)
Phlip79 Mar 6, 2026
bde8264
ci: Update golden values after PyT bump (#3733)
ko3n1g Mar 6, 2026
a979332
chore: Use PAT for CLI Launcher (#3734)
ko3n1g Mar 6, 2026
bb451db
Print more verbose error message about incorrect `model_parallel_size…
rj42 Mar 6, 2026
de63aa8
ci: Add missing gitlab rule (#3735)
ko3n1g Mar 6, 2026
37ca715
[main] Add TE CUDA Graph Support for Vision Encoder (#3293)
tomlifu Mar 6, 2026
17de0db
Optimize process management and delete operations for async save (#3262)
sbak5 Mar 6, 2026
6ec369d
Align gpt-oss window-size with 128-token sliding window (#2771)
returnL Mar 6, 2026
26f9444
fix: temperature validation error message 1000.0 -> 100.0 (#2688)
CreeperLKF Mar 6, 2026
e19fbe2
RL: Hybrid MoE training cudagraphs and fix training <-> inference tra…
mathemakitten Mar 6, 2026
c1e675f
Fix dynamic inference and GRPO functional tests (#3740)
santhnm2 Mar 6, 2026
0cfa420
Swap oncall (#3585)
janEbert Mar 6, 2026
932b767
Update copy-pr-bot.yaml [skip ci]
github-actions[bot] Mar 7, 2026
b09ee64
[bugfix] fix the bug that loss: 0 will not be printed (#1555)
leisuzz Mar 9, 2026
8318b80
Fused dLN + add in backwards pass (#3384)
CarlosGomes98 Mar 9, 2026
597721a
chore(beep boop 🤖): Bump (main) (2026-03-09)
github-actions[bot] Mar 9, 2026
116a7fa
Claude: run actions on target branch (#3745)
Phlip79 Mar 9, 2026
56158bb
revert of #2658 (#3736)
dimapihtar Mar 9, 2026
452fc11
Update README Quick Start (#3596)
ilml Mar 9, 2026
d904a68
Re-enable tests which were failing on #3373 (#3757)
mathemakitten Mar 9, 2026
94ff0dc
Check reviews properly (#3756)
Phlip79 Mar 9, 2026
ce66b22
Update copy-pr-bot.yaml [skip ci]
github-actions[bot] Mar 10, 2026
0e19bf1
Add CP + Sequence Packing support for Mimo (#2135)
mehraakash Mar 10, 2026
fca1679
MXFP8 refit (#3742)
wdykas Mar 10, 2026
397772e
Claude: update token usage (#3760)
Phlip79 Mar 10, 2026
3eea580
Handle Tool Call Argument Parsing (#3662)
sancha Mar 10, 2026
e970199
RL support for nanov3 sft checkpoint (#3741)
jon-barker Mar 10, 2026
ba497c9
add mix_hidden_states option in conversion (#3655)
yeyu-nvidia Mar 10, 2026
22c69fa
ci: Optimize release-configs for GB200 (#3541)
ko3n1g Mar 10, 2026
0f47a1a
Update copy-pr-bot.yaml [skip ci]
github-actions[bot] Mar 11, 2026
204e7d5
Add absorbed-mla (#3198)
kunlunl Mar 10, 2026
f544034
feat(checkpoint): zero-copy storage sharing in CheckpointWithoutOutpu…
Victarry Mar 11, 2026
8fd390d
Fuse MLA DOWN projection GEMMs (#3039)
cjld Mar 11, 2026
7d52694
fix: skip FSDP DTensor boundary validation under fake process group (…
Victarry Mar 11, 2026
16a8cdb
[main] fix(moe): fix the bug where gate was not sliced when kv_head <…
yuzhongw-nvidia Mar 11, 2026
b8e23d5
fix(offload): reset activation offload manager after eval as well as …
rapatel Mar 11, 2026
07e512a
chore: rotate oncall schedule
github-actions[bot] Mar 11, 2026
e20b89a
Improve error logging when invalid number of tokens is requested. (#3…
yobibyte Mar 11, 2026
5bc89f3
Add NVIDIA-Nemotron-3-Super-120B-A12B-BF16 to ModelOpt examples (#3805)
jenchen13 Mar 11, 2026
da46946
build: Bump TE2.13 (#3800)
ko3n1g Mar 11, 2026
8e64e69
Ensure dummy_forward does not attempt to run cudagraphs (#3789)
jalbericiola Mar 11, 2026
8f539df
Add speculative decoding support with MTP layers (#3594)
santhnm2 Mar 11, 2026
39472d8
Shanmugamr1992/megatron inference ultra (#3784)
shanmugamr1992 Mar 11, 2026
d997820
Fix backward compatibility issue with MFSDP `--grad-reduce-in-bf16` (…
shjwudp Mar 12, 2026
251a754
feat: add NCCL flight recorder configuration support (#3806)
sbak5 Mar 12, 2026
5a3aa17
Revert "Ensure dummy_forward does not attempt to run cudagraphs (#378…
ko3n1g Mar 12, 2026
5b25326
Fix if statement in main (#3833)
tdene Mar 12, 2026
6657173
Update golden values of weekly tests (#3829)
ko3n1g Mar 12, 2026
4736aed
build: Loosen TE restriction (#3827)
ko3n1g Mar 12, 2026
1d5e68b
Upgrade GitHub Actions for Node 24 compatibility (#3830)
ko3n1g Mar 12, 2026
46227e0
Do not let chunked prefill generate decode logprobs (#3777)
tdene Mar 12, 2026
e08dc9d
Prevent double serialization inside Flask server (#3653)
tdene Mar 12, 2026
29e798a
Allow RL to run inference-only via skip-train (#3744)
tdene Mar 12, 2026
4c1d0e4
Announce Python 3.12 migration (#3825)
ko3n1g Mar 12, 2026
fabbcdf
Update copy-pr-bot.yaml [skip ci]
github-actions[bot] Mar 13, 2026
b250472
ci: Skip test_wrong_cuda_graph_impl_returns_false in LTS (#3847)
chtruong814 Mar 13, 2026
7ca9dc5
ci: Mark TestCoordinator.test_throughput as flaky (#3849)
chtruong814 Mar 13, 2026
f261906
find optimal number of workers (#3699)
dimapihtar Mar 13, 2026
b7437fe
remove encoder_and_decoder (#3836)
dimapihtar Mar 13, 2026
8a806e5
ci: Skip more tests in test_vision_cuda_graphs for LTS (#3860)
chtruong814 Mar 13, 2026
90fee1f
Ensure that inference dummy_forward does not try to match on a cudagr…
mathemakitten Mar 13, 2026
d1b8e27
Add unit tests for speculative decoding (#3817)
santhnm2 Mar 13, 2026
d4ac04f
Fix flakiness due to timing between shutdowns (#3857)
tdene Mar 13, 2026
87eb3c2
Exposing interleave argument for fused_apply_rotary_pos_emb_thd (#3794)
huvunvidia Mar 13, 2026
9a19203
ci: install nvidia-resiliency-ext from source (#3861)
ko3n1g Mar 13, 2026
f8becec
Miscellaneous inference bug fixes (#3840)
santhnm2 Mar 14, 2026
905c0e3
Nemo-RL integration bugfixes for --transformer-impl inference_optimiz…
sidsingh-nvidia Mar 15, 2026
5dca153
chore(beep boop 🤖): Bump (main) (2026-03-16)
github-actions[bot] Mar 16, 2026
b5b1994
remove legacy mpu (#3854)
dimapihtar Mar 16, 2026
086777a
enable async save for functional tests (#3855)
dimapihtar Mar 16, 2026
925341d
remove legacy data (#3853)
dimapihtar Mar 16, 2026
8334576
docs: Document python-gitlab dependency (#3863)
ko3n1g Mar 16, 2026
9a494f1
Fsdp dsv3 proxy (#3844)
gautham-kollu Mar 16, 2026
feeb1b4
Fix token dispatched cudagraph_attrs (#3625)
asolergi-nv Mar 16, 2026
3b34761
Fix slowdown in serialization (#3872)
tdene Mar 16, 2026
c1a14fb
Establish reviewers for training code (#3765)
maanug-nv Mar 16, 2026
f89744b
Fix quantize.py script and support packed sequences in pretrain_gpt.p…
AAnoosheh Mar 16, 2026
72b10a8
Use fp32 state dtypes for Mamba inference functional test (#3888)
santhnm2 Mar 16, 2026
ff70b24
[Megatron-FSDP] Support 'auto' argument which defaults to pre-MixedPr…
cspades Mar 17, 2026
3e9d8ca
Bug fix: add missing packages to Multimodal Dockerfile (#3417)
faradawn Mar 17, 2026
43675d4
Reverse polarity of the off-policy measurement (#3580)
tdene Mar 17, 2026
84e0360
Update nightly golden values after TE2.13 (#3886)
ko3n1g Mar 17, 2026
d3c4b05
enable use_persistent_ckpt_worker for ci tests (#3898)
dimapihtar Mar 17, 2026
02e0ca5
Correctly generate state dict in MultiTokenPredictionBlock (#3624)
asolergi-nv Mar 17, 2026
589cd9e
Add torch grouped gemm bf16 and mxfp8 support w/ cuda graphed + infer…
sidsingh-nvidia Mar 17, 2026
dbcd5d9
ci: Fix build-test-publish summary job always passing (#3905)
ko3n1g Mar 17, 2026
a4888bc
ci: Skip gpt3_mcore_te_tp1_pp4_vp1 for now (#3908)
chtruong814 Mar 17, 2026
c058fc0
ci: Fix build-and-test-wheels jobs for arm (#3910)
chtruong814 Mar 18, 2026
83498ef
Add Lion optimizer support (#3813)
mchrzanowski Mar 18, 2026
0ca9b63
Support multimodule pipelining in 1F1B schedule (#3129)
yashaswikarnati Mar 18, 2026
fde4059
Add a config parameter for retaining pinned cpu buffers for cpu offlo…
rapatel Mar 18, 2026
c4bffde
Inference | Hybrid prefix caching. (#3225)
lmcafee-nvidia Mar 18, 2026
77a706e
chore: rotate oncall schedule
github-actions[bot] Mar 18, 2026
77c2095
Parity with VLLM over the reasoning field (#3873)
tdene Mar 18, 2026
4402add
Hotfix for eviction issue (#3914)
tdene Mar 18, 2026
0186ee0
CI: add parallel GB200 integration test track (#3901)
ko3n1g Mar 18, 2026
ee00a70
Track errors through the inference return path (#3776)
tdene Mar 18, 2026
1259982
Fix: Defensively close GPU device FDs in dataloader worker processes …
hexinw-nvidia Mar 18, 2026
49be885
Fix hybrid dynamic inference functional tests (#3924)
santhnm2 Mar 18, 2026
35d5c65
Patch EOD out of inference results (#3866)
tdene Mar 18, 2026
5a00bd2
ci: Add mr-github-slim label (#3934)
ko3n1g Mar 18, 2026
b488149
Revert "ci: Skip gpt3_mcore_te_tp1_pp4_vp1 for now (#3908)" (#3926)
chtruong814 Mar 18, 2026
54ddab6
Exclude arguments.py from training review (#3906)
maanug-nv Mar 18, 2026
3548385
Fix DDP bug with --overlap-grad-reduce and --num-distributed-optimize…
wplf Mar 19, 2026
9425601
Update copy-pr-bot.yaml [skip ci]
github-actions[bot] Mar 19, 2026
dde4701
Implement forced lag in RL (#3517)
tdene Mar 19, 2026
9ed8b0c
Fix incorrect HAVE_TE detection in multiple modules (#3763)
returnL Mar 19, 2026
e01d886
ci: Fix sso users check (#3938)
chtruong814 Mar 19, 2026
2276e3a
move router replay doc to advanced feature part (#3929)
ilml Mar 18, 2026
36b84d4
Refactor VisionTECudaGraphHelper to minimize overrides and clarify st…
buptzyb Mar 19, 2026
c64c64b
Fix external contributor concurrency to be global across all branches…
ko3n1g Mar 19, 2026
5811854
Fix 3-way merge issue that broke main (#3949)
tdene Mar 19, 2026
f9a4196
Fix Nemo_CICD_Test not catching cancelled/skipped functional tests (#…
ko3n1g Mar 19, 2026
9ff763a
Guard cudagraph input copy on whether data pointers have actually cha…
mathemakitten Mar 19, 2026
7688557
Enforce that flashinfer cache has been installed for inference-optimi…
santhnm2 Mar 19, 2026
40dce4e
chore: remove nv-grouped-gemm dependency (#3770)
liuyun7345 Mar 19, 2026
2bee700
Prevent failures due to prevent_retokenization (#3958)
tdene Mar 20, 2026
9a60a18
ultra refit (#3904)
wdykas Mar 20, 2026
45b8eac
[Fix][Main] Missing Assertion for moe layer recomptue in A2A Overlap …
Wohox Mar 20, 2026
f456199
Move Megatron-FSDP MixedPrecisionPolicy arguments from FSDP adapter t…
cspades Mar 20, 2026
9382abc
chore: bump FW-CI-templates to v0.80.2 (#3961)
ko3n1g Mar 20, 2026
5c87d9a
Rename RL timers to be consistent (#3878)
tdene Mar 20, 2026
c5f9dd3
ci: centralize run configuration in a single configure job (#3962)
ko3n1g Mar 20, 2026
70bda97
ci: Split unit tests into smaller groups (#3966)
ko3n1g Mar 20, 2026
6e6f0b7
Refit optimization (#3933)
wdykas Mar 20, 2026
2beb593
common strategy simplification (#3229)
dimapihtar Mar 20, 2026
197242e
Cudagraphs: Remove fwd_graph_input_surface weakref (#3970)
mathemakitten Mar 20, 2026
a9aa2c8
fix: interpolate version correctly in release Slack notification (#3977)
ko3n1g Mar 21, 2026
3f59f71
Make args and kwargs optional positional arguments for the Module hoo…
cspades Mar 21, 2026
ade81bc
ci: Add core-adlr and core-nemo to megatron/training codeowners (#3979)
chtruong814 Mar 21, 2026
b4e4d96
Small quality-of-life improvements in `megatron/training` (#3957)
deepakn94 Mar 21, 2026
f5b0ec6
Update throughput golden values to reflect speedup (#3983)
tdene Mar 21, 2026
a811ac1
Revert "ci: Add core-adlr and core-nemo to megatron/training codeowne…
chtruong814 Mar 21, 2026
b1c9403
ci: Add --repo flag to gh pr view in configure job (#3989)
ko3n1g Mar 22, 2026
febd25e
Add common pile scripts (#3902)
Phlip79 Mar 22, 2026
8f7fbe7
Introduce GDN to Mamba (#3535)
Phlip79 Mar 23, 2026
9054192
Fix IndexError in uniform activation recompute when num_layers not di…
saakshigupta2002 Mar 23, 2026
f58a328
Scaling for MuP over Muon optimizer. (#3715)
plugyawn Mar 23, 2026
947bdbb
Pass Megatron-FSDP MixedPrecision args to DDPConfig. (#3992)
cspades Mar 23, 2026
488ba8e
[OMNIML-3721] Fix tokenizer unwrapping for nested Megatron-Core token…
jenchen13 Mar 24, 2026
7da5b28
Forced load imbalance (#3380)
nanz-nv Mar 24, 2026
9e28104
Add `/claude copy` command (#3978)
Phlip79 Mar 24, 2026
70a89af
Add multi-module heterogeneous parallelism support for MIMO model (#3…
yashaswikarnati Mar 24, 2026
99638d0
added vllm fakequant export support (#3050)
kinjalpatel27 Mar 24, 2026
a9e4437
fix(modelopt): use bash array for MLM_EXTRA_ARGS to preserve quoting …
jenchen13 Mar 24, 2026
34e1e97
fix: use dump file prefix for NCCL flight recorder temp files (#3955)
sbak5 Mar 24, 2026
b44a23c
Fix PersistentAsyncCaller.__del__ crash during Python shutdown (#3781)
cluster2600 Mar 24, 2026
0065f9f
ci: Run L1 MBridge tests in merge queue (#4009)
chtruong814 Mar 24, 2026
74b0e69
Update Claude review (#3980)
Phlip79 Mar 24, 2026
4f49f2f
Migrate MoeLayer submodules from ModuleSpec to Protocols (#3426)
nschank Mar 24, 2026
3aa7396
Guard non-core imports (#3993)
maanug-nv Mar 24, 2026
7bca3c8
Fix config compatibility with Megatron-Core (#3995)
maanug-nv Mar 24, 2026
d86ba0b
Add MimoOptimizer for heterogeneous parallelism (#4019)
yashaswikarnati Mar 25, 2026
6a7b68e
[Main] Support EP Overlap's Dynamic Computation Stream For Full-Iter …
Wohox Mar 25, 2026
a8530db
fix: Handle quantized CUDA tensors in async checkpoint writer (#3845)
sbak5 Mar 25, 2026
934edfa
accept hooks marked with with_kwargs when using te.ops.sequential (#4…
CarlosGomes98 Mar 25, 2026
d31a21f
chore: rotate oncall schedule
github-actions[bot] Mar 25, 2026
1aee2d7
Use GroupedMLPSubmodules for InferenceGroupedMLP (#3743)
nschank Mar 25, 2026
694c3a9
Fix 2D tensor communication for asymmetric DP in Bridge Communicator …
yashaswikarnati Mar 25, 2026
1df5591
Add distributed checkpoint support for non-colocated MiMo (#4020)
yashaswikarnati Mar 25, 2026
d45fa3e
CUDA graph support for prefix caching on hybrid models (#3922)
lmcafee-nvidia Mar 25, 2026
c586f6d
Add ability to perform local gradient accumulation in FP32 for a subs…
deepakn94 Mar 25, 2026
09cce75
Miscellaneous MXFP8 inference fixes (#4017)
santhnm2 Mar 26, 2026
a01a6c5
Use `torch.int64` for grad_num_zero accumulation (#4015)
WanZzzzzz Mar 26, 2026
548028b
Make text generation server hostname configurable (#3935)
santhnm2 Mar 26, 2026
0842ca2
Add --muon-coefficient-type argument for Muon optimizer (#3927)
mchrzanowski Mar 26, 2026
606afda
Pass gracefully if token_id not found in message (#3862)
i-riyad Mar 26, 2026
0528a40
Improve load balancing behavior for prefix cache-aware routing (#3930)
santhnm2 Mar 26, 2026
58e0b85
Refactor setup.py to use get_pybind_include (#3658)
sakgoyal Mar 27, 2026
3758b54
build: Bump TE to 2.14 (#4025)
ko3n1g Mar 27, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
39 changes: 39 additions & 0 deletions .coderabbit.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json
language: "en-US"

# Only comment on Critical/Major bugs. No Minor, Trivial, or style comments.
tone_instructions: "Only comment on Critical or Major bugs. Never comment on Minor issues, style, refactoring, or suggestions. When in doubt, stay silent."

reviews:
# Use chill profile - filters out nitpicks automatically
profile: "chill"

# Disable all summary features
high_level_summary: false
high_level_summary_in_walkthrough: false

# Disable walkthrough comment entirely
collapse_walkthrough: true
changed_files_summary: false
sequence_diagrams: false

# Disable status/effort estimates
review_status: false
commit_status: false
estimate_code_review_effort: false

# Disable auto-suggestions for labels/reviewers
suggested_labels: false
suggested_reviewers: false

# Disable related issues/PRs lookup
assess_linked_issues: false
related_issues: false
related_prs: false

# Auto-review disabled - only review when explicitly requested via @coderabbitai review
auto_review:
enabled: false

chat:
auto_reply: true
4 changes: 4 additions & 0 deletions .flake8
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
[flake8]
max-line-length = 100
extend-ignore = E203,E501,F401,E402,E714
per-file-ignores = __init__.py:F401
64 changes: 64 additions & 0 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
megatron/core/ @NVIDIA/core-adlr @NVIDIA/core-nemo

megatron/core/models/gpt/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/gpt

megatron/core/models/multimodal/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/multi-modal

megatron/core/models/mamba/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/hybrid-mamba
megatron/core/ssm/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/hybrid-mamba

megatron/core/datasets/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/datasets

megatron/core/tokenizers/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/tokenizers

megatron/core/distributed/fsdp/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/megatron-fsdp

megatron/core/transformer/fsdp_dtensor_checkpoint.py @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/megatron-fsdp

megatron/core/dist_checkpointing/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/dist-checkpointing

megatron/core/optimizer/distrib_optimizer/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/dist-optimizer

megatron/core/inference/modelopt_support @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/quantization-and-inference

megatron/core/datasets/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/datasets

megatron/core/pipeline_parallel/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/pipeline-parallelism

megatron/core/transformer/ @NVIDIA/core-adlr @NVIDIA/core-nemo

megatron/core/transformer/moe/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/mixture-of-experts-adlr @NVIDIA/mixture-of-experts-devtech

megatron/core/inference/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/inference

megatron/core/parallel_state.py @NVIDIA/core-adlr @NVIDIA/core-nemo

megatron/core/post_training/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/post-training

megatron/post_training/ @NVIDIA/post-training

megatron/core/transformer/cuda_graphs.py @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/cuda-graphs

megatron/training/ @NVIDIA/training-adlr @NVIDIA/training-nemo
megatron/training/arguments.py

.gitlab/ @NVIDIA/ci
.github/ @NVIDIA/ci
.gitlab-ci.yml @NVIDIA/ci
docker/ @NVIDIA/ci
tests/functional_tests/python_test_utils/ @NVIDIA/ci
tests/functional_tests/shell_test_utils/ @NVIDIA/ci
tests/test_utils/recipes/ @NVIDIA/ci
tests/unit_tests/run_ci_test.sh @NVIDIA/ci

# API Backwards Compatibility Check
scripts/check_api_backwards_compatibility.py @NVIDIA/ci
scripts/README_API_COMPAT.md @NVIDIA/ci
.github/workflows/check_api_backwards_compatibility_workflow.yml @NVIDIA/ci
docs/api-backwards-compatibility-check.md @NVIDIA/ci
tests/unit_tests/test_api_backwards_compat_setup.py @NVIDIA/ci

megatron/rl/ @NVIDIA/reinforcement-learning
examples/rl/ @NVIDIA/reinforcement-learning
test/unit_tests/test_rl_utils.py @NVIDIA/reinforcement-learning
train_rl.py @NVIDIA/reinforcement-learning
29 changes: 29 additions & 0 deletions .github/ISSUE_TEMPLATE/bug_report.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
---
name: Bug report
about: Create a report to help us improve the repository or project
title: ""
labels: bug
assignees: ''

---

**Describe the bug**

A clear and concise description of what the bug is. Tag the [@mcore-oncall](https://github.com/orgs/NVIDIA/teams/mcore-oncall)
to get oncall's attention to this issue.

**Steps/Code to reproduce bug**

Please list *minimal* steps or code snippet for us to be able to reproduce the bug.

A helpful guide on on how to craft a minimal bug report http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports.


**Expected behavior**

A clear and concise description of what you expected to happen.


**Additional context**

Add any other context about the problem here.
2 changes: 2 additions & 0 deletions .github/ISSUE_TEMPLATE/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
blank_issues_enabled: false

23 changes: 23 additions & 0 deletions .github/ISSUE_TEMPLATE/feature_request.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
---
name: Feature request
about: Suggest an idea for this project
title: ""
labels: enhancement
assignees: ''

---

**Is your feature request related to a problem? Please describe.**
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Tag the [@mcore-oncall](https://github.com/orgs/NVIDIA/teams/mcore-oncall)
to get oncall's attention to this issue.

**Describe the solution you'd like**
A clear and concise description of what you want to happen.

**Describe alternatives you've considered**
A clear and concise description of any alternative solutions or features you've considered.

**Additional context**
Add any other context or screenshots about the feature request here.
13 changes: 13 additions & 0 deletions .github/ISSUE_TEMPLATE/question.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
---
name: QUESTION
about: Ask a question about Megatron-LM that is not a bug, regression or enhancement
request
title: "[QUESTION]"
labels: ''
assignees: ''

---

**Your question**
Ask a clear and concise question about Megatron-LM. Tag the [@mcore-oncall](https://github.com/orgs/NVIDIA/teams/mcore-oncall)
to get oncall's attention to this issue.
40 changes: 40 additions & 0 deletions .github/ISSUE_TEMPLATE/regression.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
---
name: REGRESSION
about: Report a regression in speed or accuracy due to a Megatron-LM update
title: "[REGRESSION]"
labels: ''
assignees: ''

---

**Describe the regression**
A clear and concise description of what the regression is. Tag the [@mcore-oncall](https://github.com/orgs/NVIDIA/teams/mcore-oncall)
to get oncall's attention to this issue.

**To Reproduce**
Steps to reproduce the behavior. The easier it is to reproduce the faster it will get maintainer attention.

**Previous performance**
What speed or accuracy did you previously see.

**New performance**
What speed or accuracy do you see after the update.

**Stack trace/logs**
If applicable, add the stack trace or logs related to the regression.

**Environment (please complete the following information):**
- Previous Megatron-LM commit ID
- New Megatron-LM commit ID
- Previous PyTorch version
- New PyTorch version
- Previous CUDA version
- New CUDA version
- Previous NCCL version
- New NCCL version

**Proposed fix**
If you have a proposal for how to fix the issue state it here or link to a PR.

**Additional context**
Add any other context about the problem here.
Loading