IB host-no-atomic: GDRCopy + mlx5dv Data Direct for memory-consistent low-latency signaling#753
Open
IB host-no-atomic: GDRCopy + mlx5dv Data Direct for memory-consistent low-latency signaling#753
host-no-atomic: GDRCopy + mlx5dv Data Direct for memory-consistent low-latency signaling#753Conversation
Contributor
Author
|
/azp run mscclpp-ut |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Binyang2014
reviewed
Feb 27, 2026
src/core/connection.cc
Outdated
Comment on lines
+223
to
+227
| #if defined(DEBUG_CUFLUSH) && defined(MSCCLPP_USE_CUDA) | ||
| // cuFlush path: read from imm_data then flush NIC->GPU write pipeline for visibility. | ||
| newValueHost = static_cast<uint64_t>(qp->getRecvWcImmData(i)); | ||
| MSCCLPP_CUTHROW(cuFlushGPUDirectRDMAWrites(CU_FLUSH_GPU_DIRECT_RDMA_WRITES_TARGET_CURRENT_CTX, | ||
| CU_FLUSH_GPU_DIRECT_RDMA_WRITES_TO_OWNER)); |
Contributor
There was a problem hiding this comment.
Do we need to keep this code here?
| // Direct host-side write to GPU memory via GDRCopy BAR1 mapping | ||
| remoteUpdateDstAddrMap_->copyTo(&newValueHost, sizeof(uint64_t)); | ||
| } else { | ||
| *dstPtr = newValueHost; |
Contributor
There was a problem hiding this comment.
Is this valid for CUDA? Maybe we can throw error if the dstAddrMap is invalid for cuda env
src/core/connection.cc
Outdated
| #endif | ||
|
|
||
| // Read dstGpuAddr from the local stored address (set by setRemoteUpdateDstAddr) | ||
| uint64_t dstGpuAddr = remoteUpdateDstAddr_; |
Contributor
There was a problem hiding this comment.
A bit confused about this var. If we use host2hostSemaphore, this addr is host addr?
host-no-atomic modehost-no-atomic: GDRCopy + mlx5dv Data Direct for memory-consistent low-latency signaling
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix potential memory inconsistency in IB host-no-atomic mode, and reduce latency overhead by introducing GDRCopy.