Skip to content
Open
Show file tree
Hide file tree
Changes from 78 commits
Commits
Show all changes
79 commits
Select commit Hold shift + click to select a range
e711b62
Initial plan
Copilot Feb 11, 2026
c881bc5
Replace gtest/gtest.h with framework.hpp in all unit tests
Copilot Feb 11, 2026
e227fdc
Convert mp_unit tests from gtest to framework.hpp
Copilot Feb 11, 2026
1e32e17
Address code review comments
Copilot Feb 11, 2026
eafa6fb
Add custom test framework and code coverage support
Copilot Feb 11, 2026
3d8a2e7
Add --gtest_filter support to framework
Copilot Feb 11, 2026
a10aff5
Address code review feedback
Copilot Feb 11, 2026
1818709
Fix CodeQL workflow by disabling test builds
Copilot Feb 11, 2026
5657e4a
Initial plan for fixing test build with GPU bypass
Copilot Feb 11, 2026
0eae34c
Fix test framework for building with Docker
Copilot Feb 11, 2026
4823583
Move FailHelper and SkipHelper into mscclpp::test namespace
Copilot Feb 11, 2026
403b2fb
Remove unnecessary CMake build artifacts from PR
Copilot Feb 11, 2026
305d157
Remove PerfTestResult and reuse TestResult directly
Copilot Feb 11, 2026
b1f458e
Convert test framework identifiers from snake_case to camelCase
Copilot Feb 11, 2026
6da12fa
Comprehensive plan for refactoring
Copilot Feb 11, 2026
7e4365f
Add performance test filtering and remove HTML coverage
Copilot Feb 11, 2026
b59196b
Integrate perf tests into unit_tests and add CI coverage step
Copilot Feb 11, 2026
ba0451a
Remove build2 CMake artifacts from repository
Copilot Feb 11, 2026
50f6a24
Remove test/perf/ directory completely
Copilot Feb 11, 2026
e26f8ab
Address PR review comments
Copilot Feb 11, 2026
7003fec
Simplify filter matching to use substring matching
Copilot Feb 11, 2026
30b9891
simplifying
chhwang Feb 19, 2026
b6ce0f2
simplify
chhwang Feb 19, 2026
d2efc2f
coverage update
chhwang Feb 19, 2026
4afbf78
minor
chhwang Feb 19, 2026
e40c72b
license text update
chhwang Feb 19, 2026
bed85b5
codecov upload
chhwang Feb 19, 2026
4d9acea
badge
chhwang Feb 19, 2026
b693d1b
lint issue
chhwang Feb 19, 2026
2b4adcc
fix lint
chhwang Feb 19, 2026
b64536f
Merge branch 'main' into copilot/remove-gtest-use-custom-framework
chhwang Feb 19, 2026
dcdd3fe
update UT CI
chhwang Feb 20, 2026
caeec75
updates
chhwang Feb 20, 2026
b9609f8
add coverage flags
chhwang Feb 20, 2026
41695ba
Merge branch 'main' into copilot/remove-gtest-use-custom-framework
chhwang Feb 20, 2026
c4afbe1
Merge branch 'main' into copilot/remove-gtest-use-custom-framework
chhwang Feb 23, 2026
04ebd9b
fix coverage file path
chhwang Feb 23, 2026
6c2bc8f
coverage fix
chhwang Feb 23, 2026
d0c709e
Fix Codecov token usage in coverage upload step
chhwang Feb 23, 2026
edda25d
Merge branch 'main' into copilot/remove-gtest-use-custom-framework
chhwang Feb 23, 2026
2f02d38
Merge branch 'main' into copilot/remove-gtest-use-custom-framework
chhwang Feb 24, 2026
2adf4a4
use variable group
chhwang Feb 24, 2026
2f27d7d
Update coverage report to exclude additional directories in lcov command
chhwang Feb 24, 2026
d88ee8d
Refine coverage report to include only mscclpp source and include dir…
chhwang Feb 24, 2026
11e27e2
Update coverage report commands to handle errors and adjust paths
chhwang Feb 24, 2026
eb99a26
Merge branch 'main' into copilot/remove-gtest-use-custom-framework
chhwang Feb 27, 2026
8c3a436
update CI
chhwang Feb 27, 2026
f4b8574
Merge branch 'main' into copilot/remove-gtest-use-custom-framework
chhwang Mar 3, 2026
bbb9c10
Update Docker image
chhwang Mar 6, 2026
60ff32c
updates
chhwang Mar 6, 2026
00583da
separate pipeline for codecov
chhwang Mar 6, 2026
c699b8a
az pipeline refactoring
chhwang Mar 7, 2026
284d913
Merge branch 'main' into copilot/remove-gtest-use-custom-framework
chhwang Mar 7, 2026
75ac8be
fix
chhwang Mar 7, 2026
e0c7ddb
fix
chhwang Mar 7, 2026
c40a233
fix
chhwang Mar 7, 2026
375bc13
fix
chhwang Mar 7, 2026
bcb392f
updates
chhwang Mar 8, 2026
ea1dd65
fix
chhwang Mar 8, 2026
d6a6fa2
simplified
chhwang Mar 8, 2026
a9cf938
fix
chhwang Mar 9, 2026
6647338
debugging
chhwang Mar 10, 2026
7a87c2c
debugging
chhwang Mar 10, 2026
cf505d7
debugging
chhwang Mar 10, 2026
757c0ec
debugging
chhwang Mar 11, 2026
e2a5be4
debugging
chhwang Mar 11, 2026
2a705f5
fix merge
chhwang Mar 11, 2026
a38bd9d
Merge branch 'main' into copilot/remove-gtest-use-custom-framework
chhwang Mar 11, 2026
e2a9692
fix merge
chhwang Mar 11, 2026
2c4bab8
fix
chhwang Mar 16, 2026
a937ce4
debugging
chhwang Mar 16, 2026
d66d7e4
debugging
chhwang Mar 17, 2026
5a65cc7
debugging
chhwang Mar 17, 2026
2297a3d
updates
chhwang Mar 18, 2026
2756221
update
chhwang Mar 18, 2026
bff76d5
Fix TearDown() handling and replace assert() in perf tests
Copilot Mar 18, 2026
6082648
fix for npkit
chhwang Mar 18, 2026
79a0149
updates
chhwang Mar 18, 2026
dfab8b9
Merge branch 'main' into copilot/remove-gtest-use-custom-framework
chhwang Mar 21, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
93 changes: 93 additions & 0 deletions .azure-pipelines/codecov.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
trigger:
branches:
include:
- main
- release/*
paths:
exclude:
- .devcontainer/**
- .github/**
- apps/**
- docker/**
- docs/**
- '**/*.md'

pr:
branches:
include:
- main
- release/*
drafts: false
paths:
exclude:
- .devcontainer/**
- .github/**
- apps/**
- docker/**
- docs/**
- '**/*.md'

jobs:
- job: CodeCoverageA100
timeoutInMinutes: 40
pool:
name: msccl-ci
variables:
- group: mscclpp
strategy:
matrix:
cuda12:
containerImage: ghcr.io/microsoft/mscclpp/mscclpp:base-dev-cuda12.9

container:
image: $(containerImage)

steps:
- template: templates/codecov.yml
parameters:
subscription: mscclpp-ci
vmssName: mscclpp-ci
gpuArch: '80'

- job: CodeCoverageH100
timeoutInMinutes: 40
pool:
name: msccl-ci-h100
variables:
- group: mscclpp
strategy:
matrix:
cuda12:
containerImage: ghcr.io/microsoft/mscclpp/mscclpp:base-dev-cuda12.9

container:
image: $(containerImage)

steps:
- template: templates/codecov.yml
parameters:
subscription: mscclpp-ci-h100
vmssName: mscclpp-h100-ci
gpuArch: '90'

- job: CodeCoverageMI300X
timeoutInMinutes: 40
pool:
name: msccl-ci-mi300x
variables:
- group: mscclpp
strategy:
matrix:
rocm6_2:
containerImage: ghcr.io/microsoft/mscclpp/mscclpp:base-dev-rocm6.2

container:
image: $(containerImage)

steps:
- template: templates/codecov.yml
parameters:
subscription: mscclpp-ci-mi300x
vmssName: mscclpp-mi300x-ci
platform: rocm
gpuArch: gfx942
6 changes: 2 additions & 4 deletions .azure-pipelines/integration-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,11 +41,10 @@ jobs:
image: $(containerImage)

steps:
- template: templates/integration-test.yaml
- template: templates/integration-test.yml
parameters:
subscription: mscclpp-ci
vmssName: mscclpp-ci
sshKeySecureFile: mscclpp.pem
gpuArch: '80'

- job: IntegrationTestH100
Expand All @@ -61,10 +60,9 @@ jobs:
image: $(containerImage)

steps:
- template: templates/integration-test.yaml
- template: templates/integration-test.yml
parameters:
subscription: mscclpp-ci-h100
vmssName: mscclpp-h100-ci
sshKeySecureFile: mscclpp.pem
perfBaselineFile: test/deploy/perf_ndmv5.jsonl
gpuArch: '90'
164 changes: 38 additions & 126 deletions .azure-pipelines/multi-nodes-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,33 +37,6 @@ jobs:
image: $[ variables['containerImage'] ]

steps:
- task: Bash@3
name: Build
displayName: Build
inputs:
targetType: 'inline'
script: |
mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Release -DMSCCLPP_BYPASS_GPU_CHECK=ON -DMSCCLPP_USE_CUDA=ON -DMSCCLPP_BUILD_TESTS=ON ..
make -j
workingDirectory: '$(System.DefaultWorkingDirectory)'

- task: DownloadSecureFile@1
name: SshKeyFile
displayName: Download key file
inputs:
secureFile: mscclpp-ssh.key

- task: Bash@3
name: InstallPackages
displayName: Install Packages
inputs:
targetType: 'inline'
script: |
sudo apt-get update -y
sudo apt-get install pssh -y
curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash

- task: Bash@3
displayName: Add HostEntry
inputs:
Expand All @@ -77,107 +50,46 @@ jobs:
echo "Entry already exists, nothing to do."
fi

- task: AzureCLI@2
name: StartVMSS
displayName: Start VMSS
inputs:
azureSubscription: msccl-it
scriptType: bash
scriptLocation: inlineScript
inlineScript: |
az vmss start --name mscclit-vmss --resource-group msccl-IT
- template: templates/deploy.yml
parameters:
subscription: msccl-it
vmssName: mscclit-vmss
resourceGroup: msccl-IT

- task: Bash@3
name: DeployTestEnv
displayName: Deploy Test Env
inputs:
targetType: filePath
filePath: test/deploy/deploy.sh
workingDirectory: '$(System.DefaultWorkingDirectory)'
- template: templates/run-remote-task.yml
parameters:
name: RunMscclppTest
displayName: Run multi-nodes mscclpp-test
runRemoteArgs: '--hostfile $(System.DefaultWorkingDirectory)/test/deploy/hostfile --host mscclit-000000 --user azureuser'
remoteScript: |
bash /root/mscclpp/test/deploy/run_tests.sh mscclpp-test

- task: Bash@3
name: RunMscclppTest
displayName: Run multi-nodes mscclpp-test
inputs:
targetType: 'inline'
script: |
set -e
HOSTFILE=$(System.DefaultWorkingDirectory)/test/mscclpp-test/deploy/hostfile
SSH_OPTION="StrictHostKeyChecking=no"
KeyFilePath=${SSHKEYFILE_SECUREFILEPATH}
rm -rf output/*
mkdir -p output
touch output/mscclit-000000
tail -f output/mscclit-000000 &
CHILD_PID=$!
parallel-ssh -t 0 -H mscclit-000000 -l azureuser -x "-i ${KeyFilePath}" \
-O $SSH_OPTION -o output 'sudo docker exec -t mscclpp-test bash /root/mscclpp/test/deploy/run_tests.sh mscclpp-test'
kill $CHILD_PID

- task: Bash@3
name: RunMultiNodeUnitTest
displayName: Run multi-nodes unit tests
inputs:
targetType: 'inline'
script: |
set -e
HOSTFILE=$(System.DefaultWorkingDirectory)/test/mscclpp-test/deploy/hostfile
SSH_OPTION="StrictHostKeyChecking=no"
KeyFilePath=${SSHKEYFILE_SECUREFILEPATH}
rm -rf output/*
mkdir -p output
touch output/mscclit-000000
tail -f output/mscclit-000000 &
CHILD_PID=$!
parallel-ssh -t 0 -H mscclit-000000 -l azureuser -x "-i ${KeyFilePath}" \
-O $SSH_OPTION -o output 'sudo docker exec -t mscclpp-test bash /root/mscclpp/test/deploy/run_tests.sh mp-ut'
kill $CHILD_PID
- template: templates/run-remote-task.yml
parameters:
name: RunMultiNodeUnitTest
displayName: Run multi-nodes unit tests
runRemoteArgs: '--hostfile $(System.DefaultWorkingDirectory)/test/deploy/hostfile --host mscclit-000000 --user azureuser'
remoteScript: |
bash /root/mscclpp/test/deploy/run_tests.sh mp-ut

- task: Bash@3
name: RunMultiNodePythonTests
displayName: Run multi-nodes python tests
inputs:
targetType: 'inline'
script: |
set -e
HOSTFILE=$(System.DefaultWorkingDirectory)/test/mscclpp-test/deploy/hostfile
SSH_OPTION="StrictHostKeyChecking=no"
KeyFilePath=${SSHKEYFILE_SECUREFILEPATH}
rm -rf output/*
mkdir -p output
touch output/mscclit-000000
tail -f output/mscclit-000000 &
CHILD_PID=$!
parallel-ssh -t 0 -H mscclit-000000 -l azureuser -x "-i ${KeyFilePath}" \
-O $SSH_OPTION -o output 'sudo docker exec -t mscclpp-test bash /root/mscclpp/test/deploy/run_tests.sh pytests'
kill $CHILD_PID
- template: templates/run-remote-task.yml
parameters:
name: RunMultiNodePythonTests
displayName: Run multi-nodes python tests
runRemoteArgs: '--hostfile $(System.DefaultWorkingDirectory)/test/deploy/hostfile --host mscclit-000000 --user azureuser'
remoteScript: |
bash /root/mscclpp/test/deploy/run_tests.sh pytests

- task: Bash@3
name: RunMultiNodePythonBenchmark
displayName: Run multi-nodes python benchmark
inputs:
targetType: 'inline'
script: |
set -e
HOSTFILE=$(System.DefaultWorkingDirectory)/test/mscclpp-test/deploy/hostfile
SSH_OPTION="StrictHostKeyChecking=no"
KeyFilePath=${SSHKEYFILE_SECUREFILEPATH}
rm -rf output/*
mkdir -p output
touch output/mscclit-000000
tail -f output/mscclit-000000 &
CHILD_PID=$!
parallel-ssh -t 0 -H mscclit-000000 -l azureuser -x "-i ${KeyFilePath}" \
-O $SSH_OPTION -o output 'sudo docker exec -t mscclpp-test bash /root/mscclpp/test/deploy/run_tests.sh py-benchmark'
kill $CHILD_PID
- template: templates/run-remote-task.yml
parameters:
name: RunMultiNodePythonBenchmark
displayName: Run multi-nodes python benchmark
runRemoteArgs: '--hostfile $(System.DefaultWorkingDirectory)/test/deploy/hostfile --host mscclit-000000 --user azureuser'
remoteScript: |
bash /root/mscclpp/test/deploy/run_tests.sh py-benchmark

- task: AzureCLI@2
name: StopVMSS
displayName: Deallocate VMSS
condition: always()
inputs:
azureSubscription: msccl-it
scriptType: bash
scriptLocation: inlineScript
inlineScript: |
az vmss deallocate --name mscclit-vmss --resource-group msccl-IT
- template: templates/stop.yml
parameters:
subscription: msccl-it
vmssName: mscclit-vmss
resourceGroup: msccl-IT
Original file line number Diff line number Diff line change
Expand Up @@ -40,11 +40,10 @@ jobs:
image: $(containerImage)

steps:
- template: templates/nccl-test.yaml
- template: templates/nccl-test.yml
parameters:
subscription: mscclpp-ci
vmssName: mscclpp-ci
sshKeySecureFile: mscclpp.pem
nvccGencode: "-gencode=arch=compute_80,code=sm_80"

- job: NcclTestH100
Expand All @@ -61,9 +60,8 @@ jobs:
image: $(containerImage)

steps:
- template: templates/nccl-test.yaml
- template: templates/nccl-test.yml
parameters:
subscription: mscclpp-ci-h100
vmssName: mscclpp-h100-ci
sshKeySecureFile: mscclpp.pem
nvccGencode: "-gencode=arch=compute_90,code=sm_90"
3 changes: 1 addition & 2 deletions .azure-pipelines/rccl-api-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -40,9 +40,8 @@ jobs:
image: $(containerImage)

steps:
- template: templates/rccl-test.yaml
- template: templates/rccl-test.yml
parameters:
subscription: mscclpp-ci-mi300x
vmssName: mscclpp-mi300x-ci
sshKeySecureFile: mscclpp.pem
gpuArch: gfx942
Loading
Loading