Skip to content

A4 & A4X TRTLLM GKE single-host inference benchmarking recipes#146

Open
hmhv1222 wants to merge 32 commits intomainfrom
mhvictorhau-20260303-a4-a4x-trtllm-benchmarking
Open

A4 & A4X TRTLLM GKE single-host inference benchmarking recipes#146
hmhv1222 wants to merge 32 commits intomainfrom
mhvictorhau-20260303-a4-a4x-trtllm-benchmarking

Conversation

@hmhv1222
Copy link
Copy Markdown
Collaborator

A4 & A4X TRTLLM GKE single-host inference benchmarking recipes, ReadMe files and config YAML files for benchmarking with certain configurations and parallelism hyperparameters.

Tested and validated on A4 and A4X GKE nodes for TRTLLM inference benchmarking with certain TP, PP, EP, number of GPU chips, input & output sequence length, precision.

Model YAML files only show a certain combination of parallelism hyperparameters and configs. Input and output length needs to be adjusted according to the model and its configs.

@hmhv1222 hmhv1222 force-pushed the mhvictorhau-20260303-a4-a4x-trtllm-benchmarking branch from 3ba13b6 to 875e36b Compare March 13, 2026 20:42
…hange default number of GPUs in A4 TRTLLM inference recipe
@hmhv1222 hmhv1222 requested a review from Chris113113 March 13, 2026 20:49
depksingh and others added 25 commits March 30, 2026 21:35
Adding recipe for A4 for WAN
* Adding the A4x wan receipe
…odes. (#148)

* Adding recipe for Llama3.1-70B with gbs 256

* Remove '-32node' suffix from WORKLOAD_NAME

* Add HF_TOKEN environment variable in launcher.sh

Added environment variable for Hugging Face token.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants