Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
67630c6
A4 & A4X TRTLLM GKE Single-Host Inference Recipes, ReadMe and Config …
hmhv1222 Mar 13, 2026
875e36b
Add benchmarking configs warning paragraphs in TRTLLM inference ReadMe
hmhv1222 Mar 13, 2026
5db07bb
Remove memory resources constraint in A4X serving-launcher.yaml and c…
hmhv1222 Mar 13, 2026
ae7651c
Adding WAN Recipe for A4 (#144)
depksingh Mar 16, 2026
b7b3c25
Adding the A4x wan receipe (#143)
Priya-Quad Mar 16, 2026
fbe277b
a4x cs llama 405b
notabee Mar 16, 2026
d0b19ef
Adding a recipe for Llama3.1-70B with gbs 256 scaled Recipe from 16 n…
incredere Mar 18, 2026
4eacf50
Update submit.slurm
notabee Mar 18, 2026
813c0e2
Add Qwen3 235B A22B recipe on 16-node B200 #recipebot
weikuo0506 Mar 19, 2026
19427f9
Update README to match Qwen3 235B 16-node recipe template
weikuo0506 Mar 19, 2026
cbac6ec
Update README to match the exact template structure
weikuo0506 Mar 19, 2026
2e9bd49
temp
Alina-PANG Mar 23, 2026
a382c47
Add Qwen3 235B A22B FP8MX GBS8192 recipe on 32-node B200 #recipebot
weikuo0506 Mar 19, 2026
fc2904c
Update README to use exact strict formatting of template
weikuo0506 Mar 19, 2026
82eb1b9
Move 16node-BF16-GBS4096 files into recipe directory #recipebot
weikuo0506 Mar 19, 2026
b9ad8d2
Add GPT-OSS 120B 8-node BF16 recipe #recipebot
weikuo0506 Mar 19, 2026
eb44b3e
Add DeepSeek V3 32-node FP8MX SEQ4096 GBS4096 recipe #recipebot
weikuo0506 Mar 23, 2026
0055f1d
Restructure qwen3 recipes by nemo version
weikuo0506 Mar 23, 2026
faf33ad
Add QWEN3 235B 32-node BF16 SEQ4096 GBS4096 recipe #recipebot
weikuo0506 Mar 23, 2026
afa3033
Add Llama 3.1 405B FP8CS 16-node recipe on a4 (nemo2602)
Alina-PANG Mar 23, 2026
cdbf60b
Add recipe for Llama3.1 405B FP8CS B200 256GPUs
Alina-PANG Mar 23, 2026
047bce8
Add Qwen3 30B Nemo Pretraining recipe on A4
Alina-PANG Mar 24, 2026
178c961
Update DeepSeek V3 32-node BF16 NEMO26.02 recipe with optimized launc…
weikuo0506 Mar 19, 2026
ad5cb23
Move 32node recipe into nemo2602 subdirectory and update README
weikuo0506 Mar 23, 2026
edb4a0e
Move 32node NEMO25.11 recipe into nemo2511 subdirectory
weikuo0506 Mar 23, 2026
df91ee3
#recipebot Add NeMo pretraining A4 Llama3.1 70b recipe
Alina-PANG Mar 25, 2026
cac0500
chore: Migrate gsutil usage to gcloud storage
gurusai-voleti Feb 18, 2026
059fd3d
chore: update
gurusai-voleti Feb 18, 2026
6c52c65
Add Qwen3 235B A22B Megatron-Bridge pretraining recipe on A4X Slurm
notabee Mar 27, 2026
c73722c
Add Qwen3 235B A22B pretraining recipe on A4X Slurm Cluster
notabee Mar 30, 2026
5f17384
Qwen3 235B A22B, Qwen 2.5 VL 7B & Llama 3.1 405B config files and lau…
hmhv1222 Mar 30, 2026
d955554
Support running VL models with trtllm-launcher.sh
hmhv1222 Mar 30, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,7 @@ First, you'll configure your local environment. These steps are required once be
git clone https://github.com/ai-hypercomputer/gpu-recipes.git
cd gpu-recipes
export REPO_ROOT=$(pwd)
export RECIPE_ROOT=$REPO_ROOT/inference/a4/single-host-serving/sglang
export RECIPE_ROOT=$REPO_ROOT/inference/a4/single-host-serving/sglang/deepseek-r1-671b
```

<a name="configure-vars"></a>
Expand Down Expand Up @@ -450,4 +450,4 @@ To avoid incurring further charges, clean up the resources you created.
3. (Optional) Delete the built Docker image from Artifact Registry if no longer needed.
4. (Optional) Delete Cloud Build logs.
5. (Optional) Clean up files in your GCS bucket if benchmarking was performed.
6. (Optional) Delete the [test environment](#test-environment) provisioned including GKE cluster.
6. (Optional) Delete the [test environment](#test-environment) provisioned including GKE cluster.
Original file line number Diff line number Diff line change
Expand Up @@ -59,4 +59,4 @@ network:
gibVersion: us-docker.pkg.dev/gce-ai-infra/gpudirect-gib/nccl-plugin-gib:v1.0.5
ncclSettings:
- name: NCCL_DEBUG
value: "WARN"
value: "WARN"
Loading