[AIMIGRAPHX-885][AIMIGRAPGX-987] Use External Stream Contexts#4775
[AIMIGRAPHX-885][AIMIGRAPGX-987] Use External Stream Contexts#4775TedThemistokleous wants to merge 13 commits intodevelopfrom
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #4775 +/- ##
===========================================
+ Coverage 92.49% 92.81% +0.32%
===========================================
Files 583 584 +1
Lines 29562 30104 +542
===========================================
+ Hits 27343 27941 +598
+ Misses 2219 2163 -56
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
Enables MIGraphX GPU async execution to run directly on a caller-provided HIP stream (external stream contexts) to reduce internal synchronization/stalls, and adds GPU tests to validate external-stream behavior and fallback behavior.
Changes:
- Add external-stream support in
gpu::context/hip_device::stream(override stream used by the context during async eval). - Adjust async synchronization logic (
wait_for/finish_on) to avoid creating/using an extra internal stream when an external stream is provided. - Add a comprehensive new GPU test suite covering external stream override, async eval behavior, and fallback paths.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
src/targets/gpu/include/migraphx/gpu/context.hpp |
Adds external stream override plumbing and modifies async sync behavior to use caller stream. |
test/gpu/external_stream.cpp |
Adds new GPU tests for external stream override, async eval correctness, and state cleanup expectations. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
bdevorem
left a comment
There was a problem hiding this comment.
couple questions, thanks Ted
| auto host_result = migraphx::gpu::from_gpu(gout); | ||
| verify_data(host_result, out_shape, 12.0f); | ||
| } | ||
|
|
There was a problem hiding this comment.
the PR description says external libs reset to default stream on clear/finish, but the tests seem to mostly assert get_queue() or has_external_stream() and numerical results. I think none would fail if MIOpen or rocBLAS were left bound to the customer stream?
d2cfb6b to
563c78b
Compare
bdevorem
left a comment
There was a problem hiding this comment.
I think a rebase/merge will solve the CI problems. lgtm otherwise
pfultz2
left a comment
There was a problem hiding this comment.
I dont think we should do this in the wait_for and finish_on functions as it could change the semantics of the function. Instead we should add a use_queue method to the context interface and use that directly.
Sure rebased this off develop |
7b92f18 to
a164acb
Compare
Okay let me add this. This is similar and just do the create/set in the use_queue or use_external() thread?
So is the idea then run_async() -> bind the stream if its not the null/default stream? otherwise we just create an internal stream for regular run()? |
|
|
Added changes based on Paul's comments so that we odn't modify wait_for, finish_on and just use a use_queue and set_queue_context. |
CharlieL7
left a comment
There was a problem hiding this comment.
There's the tools/include/context.hpp file that looks like the file that generates the type erasure interface. The use_queue interface should probably be generated from there.
Motivation
Customer workload seeing some stalls during inference. This allows us to use the customer hipSteam passed to context via run_async so that we don't need to internally sync and manage a thread within MIGraphX. This allows the synchronization to be handled externally.
As an added benefit if not external thread is used we should fall back to the old fork_join run on the GPU where we internally create a stream to sync events onto.
Technical Details
Adds additional conditions to the wait_for , finish_on calls in context.cpp such that we avoid new stream creation for async runs while also simplifying much of the code.
Test cases have been added for this to ensure we don't break existing functionality.
Additional code added to ensure we set external libraries like BLAS and MIOPEN to use the default stream on clear
Changelog Category
Add a
CHANGELOG.mdentry for any option other thanNot Applicable