Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
416de68
add small llm pretraining
suachong Aug 15, 2025
9c98fa6
Update pretrain_llama31.py
suachong Aug 15, 2025
2159c30
Merge remote-tracking branch 'mlcommons/master'
suachong Aug 16, 2025
08d765e
update README instruction, minor change to callback and set default t…
suachong Aug 16, 2025
7b6544d
set LR for GB32
suachong Aug 18, 2025
2b66faa
update README with download instructions from https://training.mlcomm…
suachong Aug 20, 2025
bf7b2a7
merge mlcommons/master
suachong Aug 20, 2025
fed1bb4
update link to https://github.com/mlcommons/r2-downloader
suachong Aug 20, 2025
7f8f3ec
Revert "update link to https://github.com/mlcommons/r2-downloader"
suachong Aug 20, 2025
bb2d788
merge mlcommons/master
suachong Sep 29, 2025
9f3b94c
Merge remote-tracking branch 'upstream/master'
suachong Jan 19, 2026
c5bff49
initial commit for gpt-oss-20b
suachong Jan 19, 2026
b183204
update README
suachong Jan 19, 2026
a1a6356
update NV dockerfile and readme
suachong Jan 19, 2026
77eaf43
set random seed
suachong Jan 20, 2026
aa7cbd7
update license for amd + clean up rocm dockerfile
suachong Jan 20, 2026
c110d5d
revisit the target log perplexity after establishing rcp
suachong Jan 20, 2026
f856c7f
remove target metric
suachong Jan 20, 2026
83bcba0
remove HF_TOKEN
suachong Feb 2, 2026
4129037
Update run_and_time.sh
mmarcinkiewicz Feb 3, 2026
71c75b8
Create run.sub
mmarcinkiewicz Feb 3, 2026
97dbfe3
Update run_and_time.sh
mmarcinkiewicz Feb 3, 2026
97e2a50
Update README.md
mmarcinkiewicz Feb 3, 2026
14ccfe7
Update run.sub
mmarcinkiewicz Feb 3, 2026
c5bf7cb
Update run.sub
mmarcinkiewicz Feb 3, 2026
de38b48
Update run.sub
mmarcinkiewicz Feb 3, 2026
397ce2b
Update Dockerfile.nvidia
mmarcinkiewicz Feb 3, 2026
c0ff227
Update run.sub
mmarcinkiewicz Feb 3, 2026
ea6e87e
update configs to match 8b, addressed pr comments
suachong Feb 4, 2026
b448b8c
Merge branch 'suachong:master' into master
mmarcinkiewicz Feb 4, 2026
db6a5df
Merge pull request #2 from mmarcinkiewicz/master
suachong Feb 4, 2026
18bdd3a
expose adam_eps with env var
suachong Feb 4, 2026
1c7d631
Merge branch 'master' of https://github.com/suachong/training
suachong Feb 4, 2026
78a267f
update more configs based on hf + nvidia configs
suachong Feb 10, 2026
9f6e611
remove yarn patch and add primus evaluator patch
suachong Feb 12, 2026
381f4ff
update megatron validation consumed samples
suachong Feb 12, 2026
da9fab4
remove utilities, not needed
suachong Feb 20, 2026
a2e9e22
set master port to 29500, consistent with src/train.py
suachong Feb 20, 2026
7ee22e1
remove hardcoded company name
suachong Feb 20, 2026
e8f1003
Revise evaluation metrics and training parameters in README
ShriyaRishab Feb 20, 2026
0c927dc
update target metric and approximate runtime
suachong Feb 27, 2026
0977e91
Merge remote-tracking branch 'upstream/master'
suachong Feb 27, 2026
7681fcb
upload gbs16 logs
suachong Feb 27, 2026
83f6c3e
rm gpt-oss-20b folder
suachong Feb 27, 2026
fd11ec4
remove extra file
suachong Feb 27, 2026
762fdf1
update approximate runtime
suachong Feb 27, 2026
dd661ee
upload gbs32 rcp logs
suachong Mar 4, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion small_llm_moe_pretraining/primus/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -149,4 +149,4 @@ gpt-oss-20b/primus/
```
# 8. Approximnate runtime

TBD
Approximate train time to convergence is ~6.5 hours.
20,170 changes: 20,170 additions & 0 deletions small_llm_moe_pretraining/primus/rcp_logs/gbs16/run_0.log

Large diffs are not rendered by default.

46,648 changes: 46,648 additions & 0 deletions small_llm_moe_pretraining/primus/rcp_logs/gbs16/run_1.log

Large diffs are not rendered by default.

22,492 changes: 22,492 additions & 0 deletions small_llm_moe_pretraining/primus/rcp_logs/gbs16/run_10.log

Large diffs are not rendered by default.

20,944 changes: 20,944 additions & 0 deletions small_llm_moe_pretraining/primus/rcp_logs/gbs16/run_11.log

Large diffs are not rendered by default.

21,718 changes: 21,718 additions & 0 deletions small_llm_moe_pretraining/primus/rcp_logs/gbs16/run_12.log

Large diffs are not rendered by default.

20,944 changes: 20,944 additions & 0 deletions small_llm_moe_pretraining/primus/rcp_logs/gbs16/run_13.log

Large diffs are not rendered by default.

23,266 changes: 23,266 additions & 0 deletions small_llm_moe_pretraining/primus/rcp_logs/gbs16/run_14.log

Large diffs are not rendered by default.

24,040 changes: 24,040 additions & 0 deletions small_llm_moe_pretraining/primus/rcp_logs/gbs16/run_15.log

Large diffs are not rendered by default.

15,525 changes: 15,525 additions & 0 deletions small_llm_moe_pretraining/primus/rcp_logs/gbs16/run_16.log

Large diffs are not rendered by default.

16,299 changes: 16,299 additions & 0 deletions small_llm_moe_pretraining/primus/rcp_logs/gbs16/run_17.log

Large diffs are not rendered by default.

16,299 changes: 16,299 additions & 0 deletions small_llm_moe_pretraining/primus/rcp_logs/gbs16/run_18.log

Large diffs are not rendered by default.

14,751 changes: 14,751 additions & 0 deletions small_llm_moe_pretraining/primus/rcp_logs/gbs16/run_19.log

Large diffs are not rendered by default.

20,944 changes: 20,944 additions & 0 deletions small_llm_moe_pretraining/primus/rcp_logs/gbs16/run_2.log

Large diffs are not rendered by default.

21,718 changes: 21,718 additions & 0 deletions small_llm_moe_pretraining/primus/rcp_logs/gbs16/run_3.log

Large diffs are not rendered by default.

15,525 changes: 15,525 additions & 0 deletions small_llm_moe_pretraining/primus/rcp_logs/gbs16/run_4.log

Large diffs are not rendered by default.

14,751 changes: 14,751 additions & 0 deletions small_llm_moe_pretraining/primus/rcp_logs/gbs16/run_5.log

Large diffs are not rendered by default.

15,525 changes: 15,525 additions & 0 deletions small_llm_moe_pretraining/primus/rcp_logs/gbs16/run_6.log

Large diffs are not rendered by default.

16,299 changes: 16,299 additions & 0 deletions small_llm_moe_pretraining/primus/rcp_logs/gbs16/run_7.log

Large diffs are not rendered by default.

15,525 changes: 15,525 additions & 0 deletions small_llm_moe_pretraining/primus/rcp_logs/gbs16/run_8.log

Large diffs are not rendered by default.

13,492 changes: 13,492 additions & 0 deletions small_llm_moe_pretraining/primus/rcp_logs/gbs16/run_9.log

Large diffs are not rendered by default.

9,795 changes: 9,795 additions & 0 deletions small_llm_moe_pretraining/primus/rcp_logs/gbs32/run_0.log

Large diffs are not rendered by default.

9,405 changes: 9,405 additions & 0 deletions small_llm_moe_pretraining/primus/rcp_logs/gbs32/run_1.log

Large diffs are not rendered by default.

9,015 changes: 9,015 additions & 0 deletions small_llm_moe_pretraining/primus/rcp_logs/gbs32/run_10.log

Large diffs are not rendered by default.

9,015 changes: 9,015 additions & 0 deletions small_llm_moe_pretraining/primus/rcp_logs/gbs32/run_11.log

Large diffs are not rendered by default.

9,405 changes: 9,405 additions & 0 deletions small_llm_moe_pretraining/primus/rcp_logs/gbs32/run_12.log

Large diffs are not rendered by default.

9,015 changes: 9,015 additions & 0 deletions small_llm_moe_pretraining/primus/rcp_logs/gbs32/run_13.log

Large diffs are not rendered by default.

9,795 changes: 9,795 additions & 0 deletions small_llm_moe_pretraining/primus/rcp_logs/gbs32/run_14.log

Large diffs are not rendered by default.

9,795 changes: 9,795 additions & 0 deletions small_llm_moe_pretraining/primus/rcp_logs/gbs32/run_15.log

Large diffs are not rendered by default.

9,015 changes: 9,015 additions & 0 deletions small_llm_moe_pretraining/primus/rcp_logs/gbs32/run_16.log

Large diffs are not rendered by default.

9,795 changes: 9,795 additions & 0 deletions small_llm_moe_pretraining/primus/rcp_logs/gbs32/run_17.log

Large diffs are not rendered by default.

9,796 changes: 9,796 additions & 0 deletions small_llm_moe_pretraining/primus/rcp_logs/gbs32/run_18.log

Large diffs are not rendered by default.

8,626 changes: 8,626 additions & 0 deletions small_llm_moe_pretraining/primus/rcp_logs/gbs32/run_19.log

Large diffs are not rendered by default.

9,015 changes: 9,015 additions & 0 deletions small_llm_moe_pretraining/primus/rcp_logs/gbs32/run_2.log

Large diffs are not rendered by default.

9,015 changes: 9,015 additions & 0 deletions small_llm_moe_pretraining/primus/rcp_logs/gbs32/run_3.log

Large diffs are not rendered by default.

9,015 changes: 9,015 additions & 0 deletions small_llm_moe_pretraining/primus/rcp_logs/gbs32/run_4.log

Large diffs are not rendered by default.

9,015 changes: 9,015 additions & 0 deletions small_llm_moe_pretraining/primus/rcp_logs/gbs32/run_5.log

Large diffs are not rendered by default.

9,015 changes: 9,015 additions & 0 deletions small_llm_moe_pretraining/primus/rcp_logs/gbs32/run_6.log

Large diffs are not rendered by default.

9,405 changes: 9,405 additions & 0 deletions small_llm_moe_pretraining/primus/rcp_logs/gbs32/run_7.log

Large diffs are not rendered by default.

9,405 changes: 9,405 additions & 0 deletions small_llm_moe_pretraining/primus/rcp_logs/gbs32/run_8.log

Large diffs are not rendered by default.

10,185 changes: 10,185 additions & 0 deletions small_llm_moe_pretraining/primus/rcp_logs/gbs32/run_9.log

Large diffs are not rendered by default.