A4 & A4X TRTLLM GKE single-host inference benchmarking recipes by hmhv1222 · Pull Request #146 · AI-Hypercomputer/gpu-recipes

hmhv1222 · 2026-03-13T20:39:25Z

A4 & A4X TRTLLM GKE single-host inference benchmarking recipes, ReadMe files and config YAML files for benchmarking with certain configurations and parallelism hyperparameters.

Tested and validated on A4 and A4X GKE nodes for TRTLLM inference benchmarking with certain TP, PP, EP, number of GPU chips, input & output sequence length, precision.

Model YAML files only show a certain combination of parallelism hyperparameters and configs. Input and output length needs to be adjusted according to the model and its configs.

…Files

…hange default number of GPUs in A4 TRTLLM inference recipe

Adding recipe for A4 for WAN

* Adding the A4x wan receipe

…odes. (#148) * Adding recipe for Llama3.1-70B with gbs 256 * Remove '-32node' suffix from WORKLOAD_NAME * Add HF_TOKEN environment variable in launcher.sh Added environment variable for Hugging Face token.

…her flags #recipebot

…ncher modification

hmhv1222 added 2 commits March 13, 2026 20:36

A4 & A4X TRTLLM GKE Single-Host Inference Recipes, ReadMe and Config …

67630c6

…Files

Add benchmarking configs warning paragraphs in TRTLLM inference ReadMe

875e36b

hmhv1222 force-pushed the mhvictorhau-20260303-a4-a4x-trtllm-benchmarking branch from 3ba13b6 to 875e36b Compare March 13, 2026 20:42

Remove memory resources constraint in A4X serving-launcher.yaml and c…

5db07bb

…hange default number of GPUs in A4 TRTLLM inference recipe

hmhv1222 requested a review from Chris113113 March 13, 2026 20:49

depksingh and others added 25 commits March 30, 2026 21:35

Adding WAN Recipe for A4 (#144)

ae7651c

Adding recipe for A4 for WAN

Adding the A4x wan receipe (#143)

b7b3c25

* Adding the A4x wan receipe

a4x cs llama 405b

fbe277b

Adding a recipe for Llama3.1-70B with gbs 256 scaled Recipe from 16 n…

d0b19ef

…odes. (#148) * Adding recipe for Llama3.1-70B with gbs 256 * Remove '-32node' suffix from WORKLOAD_NAME * Add HF_TOKEN environment variable in launcher.sh Added environment variable for Hugging Face token.

Update submit.slurm

4eacf50

Add Qwen3 235B A22B recipe on 16-node B200 #recipebot

813c0e2

Update README to match Qwen3 235B 16-node recipe template

19427f9

Update README to match the exact template structure

cbac6ec

temp

2e9bd49

Add Qwen3 235B A22B FP8MX GBS8192 recipe on 32-node B200 #recipebot

a382c47

Update README to use exact strict formatting of template

fc2904c

Move 16node-BF16-GBS4096 files into recipe directory #recipebot

82eb1b9

Add GPT-OSS 120B 8-node BF16 recipe #recipebot

b9ad8d2

Add DeepSeek V3 32-node FP8MX SEQ4096 GBS4096 recipe #recipebot

eb44b3e

Restructure qwen3 recipes by nemo version

0055f1d

Add QWEN3 235B 32-node BF16 SEQ4096 GBS4096 recipe #recipebot

faf33ad

Add Llama 3.1 405B FP8CS 16-node recipe on a4 (nemo2602)

afa3033

Add recipe for Llama3.1 405B FP8CS B200 256GPUs

cdbf60b

Add Qwen3 30B Nemo Pretraining recipe on A4

047bce8

Update DeepSeek V3 32-node BF16 NEMO26.02 recipe with optimized launc…

178c961

…her flags #recipebot

Move 32node recipe into nemo2602 subdirectory and update README

ad5cb23

Move 32node NEMO25.11 recipe into nemo2511 subdirectory

edb4a0e

#recipebot Add NeMo pretraining A4 Llama3.1 70b recipe

df91ee3

chore: Migrate gsutil usage to gcloud storage

cac0500

chore: update

059fd3d

notabee and others added 4 commits March 30, 2026 21:36

Add Qwen3 235B A22B Megatron-Bridge pretraining recipe on A4X Slurm

6c52c65

Add Qwen3 235B A22B pretraining recipe on A4X Slurm Cluster

c73722c

Qwen3 235B A22B, Qwen 2.5 VL 7B & Llama 3.1 405B config files and lau…

5f17384

…ncher modification

Support running VL models with trtllm-launcher.sh

d955554

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A4 & A4X TRTLLM GKE single-host inference benchmarking recipes#146

A4 & A4X TRTLLM GKE single-host inference benchmarking recipes#146
hmhv1222 wants to merge 32 commits intomainfrom
mhvictorhau-20260303-a4-a4x-trtllm-benchmarking

hmhv1222 commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Conversation

hmhv1222 commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants