Add an implementation of ParticleCopyPlan::doHandShake that uses one-sided communication from MPI-3 by atmyers · Pull Request #5227 · AMReX-Codes/amrex

atmyers · 2026-03-25T21:24:55Z

This is one of the possible optimizations raised in Issue #4179.

This passes all tests locally and on Perlmutter. However, the one-sided version of doHandShake doesn't seem to be a performance win in the tests I ran.

The new handshake is only needed for the global redistribution path. To trigger this path, as a first test I made a particle redistribution benchmark where 1% of the particles jump to a random location in the domain. For this communication pattern, the one-sided handshake is basically the same as reduce/scatter on up to 128 nodes, then slower after that.

Then I thought, the above communication pattern has each rank on average sending some particles to every other rank, basically an all-to-all, which is a bad case for the one-sided version. So I did another test modeled on what we do in load balancing in WarpX. Here, there are 2 boxes per GPU, so on 1024 ranks there would be 2048 boxes. Instead of moving the particles, each step I change the distribution map randomly, so that on average each rank will send particles to 2 other ranks and receive from 2. This kind of sparse distribution pattern should be a good case for the one-sided version. However, even in this case the performance is basically the same in both cases.

None of these runs included the fix in PR #5260, which would improve the overall redistribute scaling but not effect the handshake time.

Overall I think the regime in which on-sided would be expected to win is relatively small. It would need to trigger the global redistribution path but with a sparse comm pattern. And even in the case I didn't see a win on the test I did.

However, since the one-sided method is off by default I think we should merge this as an option, since the timings are dependent on how good the RMA support is for specific MPI implementations and maybe this will work better on other systems.

The proposed changes:

fix a bug or incorrect behavior in AMReX
add new capabilities to AMReX
changes answers in the test suite to more than roundoff level
are likely to significantly affect the results of downstream AMReX users
include documentation in the code and/or rst files, if appropriate

…sided communication from MPI-3

atmyers added 4 commits March 25, 2026 14:22

Add an implementation of ParticleCopyPlan::doHandShake that uses one-…

6889bc6

…sided communication from MPI-3

Don't recreate window every time doHandShake is called.

39c1f97

New test that more heavily stresses global redistribute path

3e8bd77

Add test that stresses regridding

cef0a43

atmyers requested review from AlexanderSinn and WeiqunZhang March 31, 2026 17:56

atmyers added 3 commits March 31, 2026 10:59

ignore unused

b85e6b7

fix multi-line NOLINT

562a547

fix narrowing warning

ba15d85

atmyers mentioned this pull request Apr 1, 2026

Possible RedistributeGPU optimizations #4179

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an implementation of ParticleCopyPlan::doHandShake that uses one-sided communication from MPI-3#5227

Add an implementation of ParticleCopyPlan::doHandShake that uses one-sided communication from MPI-3#5227
atmyers wants to merge 7 commits intoAMReX-Codes:developmentfrom
atmyers:one_sides_handshake

atmyers commented Mar 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

atmyers commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

atmyers commented Mar 25, 2026 •

edited

Loading