Skip to content

[AIMIGRAPHX-885][AIMIGRAPGX-987] Use External Stream Contexts#4775

Open
TedThemistokleous wants to merge 10 commits intodevelopfrom
use_external_contexts
Open

[AIMIGRAPHX-885][AIMIGRAPGX-987] Use External Stream Contexts#4775
TedThemistokleous wants to merge 10 commits intodevelopfrom
use_external_contexts

Conversation

@TedThemistokleous
Copy link
Copy Markdown
Collaborator

Motivation

Customer workload seeing some stalls during inference. This allows us to use the customer hipSteam passed to context via run_async so that we don't need to internally sync and manage a thread within MIGraphX. This allows the synchronization to be handled externally.

As an added benefit if not external thread is used we should fall back to the old fork_join run on the GPU where we internally create a stream to sync events onto.

Technical Details

Adds additional conditions to the wait_for , finish_on calls in context.cpp such that we avoid new stream creation for async runs while also simplifying much of the code.

Test cases have been added for this to ensure we don't break existing functionality.
Additional code added to ensure we set external libraries like BLAS and MIOPEN to use the default stream on clear

Changelog Category

Add a CHANGELOG.md entry for any option other than Not Applicable

    • Added: New functionality.
    • Changed: Changes to existing functionality.
    • Removed: Functionality or support that has been removed. (Compared to a previous release)
    • Optimized: Component performance that has been optimized or improved.
    • Resolved Issues: Known issues from a previous version that have been resolved.
    • Not Applicable: This PR is not to be included in the changelog.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 10, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #4775   +/-   ##
========================================
  Coverage    92.49%   92.49%           
========================================
  Files          583      583           
  Lines        29562    29562           
========================================
  Hits         27343    27343           
  Misses        2219     2219           

see 20 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Enables MIGraphX GPU async execution to run directly on a caller-provided HIP stream (external stream contexts) to reduce internal synchronization/stalls, and adds GPU tests to validate external-stream behavior and fallback behavior.

Changes:

  • Add external-stream support in gpu::context/hip_device::stream (override stream used by the context during async eval).
  • Adjust async synchronization logic (wait_for / finish_on) to avoid creating/using an extra internal stream when an external stream is provided.
  • Add a comprehensive new GPU test suite covering external stream override, async eval behavior, and fallback paths.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
src/targets/gpu/include/migraphx/gpu/context.hpp Adds external stream override plumbing and modifies async sync behavior to use caller stream.
test/gpu/external_stream.cpp Adds new GPU tests for external stream override, async eval correctness, and state cleanup expectations.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +390 to +396
if(not get_stream().has_external_stream())
{
get_stream().record(finish_event.get());
auto status = hipStreamWaitEvent(queue.get<hipStream_t>(), finish_event.get(), 0);
if(status != hipSuccess)
MIGRAPHX_THROW("Failed to wait on event: " + hip_error(status));
}
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

finish_on() skips all work when an external stream is active, but it also never restores the stream state. Since program::eval() calls wait_for()/finish_on() around async execution, this means an async eval will leave the GPU context permanently bound to the external stream (affecting later sync evals and finish()). finish_on() (or a dedicated scope guard) should clear the external stream and restore library handles back to the internal/default stream after the async run completes.

Suggested change
if(not get_stream().has_external_stream())
{
get_stream().record(finish_event.get());
auto status = hipStreamWaitEvent(queue.get<hipStream_t>(), finish_event.get(), 0);
if(status != hipSuccess)
MIGRAPHX_THROW("Failed to wait on event: " + hip_error(status));
}
if(get_stream().has_external_stream())
{
get_stream().set_external_stream(nullptr);
return;
}
get_stream().record(finish_event.get());
auto status = hipStreamWaitEvent(queue.get<hipStream_t>(), finish_event.get(), 0);
if(status != hipSuccess)
MIGRAPHX_THROW("Failed to wait on event: " + hip_error(status));

Copilot uses AI. Check for mistakes.
Comment on lines +374 to +383
auto *ext = queue.get<hipStream_t>();
if(ext == nullptr)
{
auto status = hipEventRecord(begin_event.get(), ext);
if(status != hipSuccess)
MIGRAPHX_THROW("Failed to record: " + hip_error(status));
get_stream().wait(begin_event.get());
}
else
{
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait_for() calls queue.get<hipStream_t>() unconditionally. If the caller passes a null stream via execution_environment{nullptr, true} (which is a reasonable way to request the fallback/event path), the any_ptr was constructed with a typed nullptr and any_ptr::get() will hit assert(not ti or ptr != nullptr) in debug builds. Consider checking queue.unsafe_get() == nullptr first and treating that as the null-stream fallback, only calling get<hipStream_t>() when the pointer is non-null.

Suggested change
auto *ext = queue.get<hipStream_t>();
if(ext == nullptr)
{
auto status = hipEventRecord(begin_event.get(), ext);
if(status != hipSuccess)
MIGRAPHX_THROW("Failed to record: " + hip_error(status));
get_stream().wait(begin_event.get());
}
else
{
if(queue.unsafe_get() == nullptr)
{
auto status = hipEventRecord(begin_event.get(), nullptr);
if(status != hipSuccess)
MIGRAPHX_THROW("Failed to record: " + hip_error(status));
get_stream().wait(begin_event.get());
}
else
{
auto* ext = queue.get<hipStream_t>();

Copilot uses AI. Check for mistakes.
Comment thread src/targets/gpu/include/migraphx/gpu/context.hpp
Comment thread src/targets/gpu/include/migraphx/gpu/context.hpp Outdated
Copy link
Copy Markdown
Member

@bdevorem bdevorem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple questions, thanks Ted

Comment thread src/targets/gpu/include/migraphx/gpu/context.hpp Outdated
Comment thread test/gpu/external_stream.cpp Outdated
auto host_result = migraphx::gpu::from_gpu(gout);
verify_data(host_result, out_shape, 12.0f);
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the PR description says external libs reset to default stream on clear/finish, but the tests seem to mostly assert get_queue() or has_external_stream() and numerical results. I think none would fail if MIOpen or rocBLAS were left bound to the customer stream?

Copy link
Copy Markdown
Member

@bdevorem bdevorem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a rebase/merge will solve the CI problems. lgtm otherwise

Copy link
Copy Markdown
Collaborator

@pfultz2 pfultz2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think we should do this in the wait_for and finish_on functions as it could change the semantics of the function. Instead we should add a use_queue method to the context interface and use that directly.

Comment thread src/targets/gpu/include/migraphx/gpu/context.hpp Outdated
@TedThemistokleous
Copy link
Copy Markdown
Collaborator Author

I think a rebase/merge will solve the CI problems. lgtm otherwise

Sure rebased this off develop

@TedThemistokleous
Copy link
Copy Markdown
Collaborator Author

TedThemistokleous commented Apr 17, 2026

I dont think we should do this in the wait_for and finish_on functions as it could change the semantics of the function. Instead we should add a use_queue method to the context interface and use that directly

Okay let me add this. This is similar and just do the create/set in the use_queue or use_external() thread?

I dont think we should do this in the wait_for and finish_on functions as it could change the semantics of the function. Instead we should add a use_queue method to the context interface and use that directly.

So is the idea then run_async() -> bind the stream if its not the null/default stream? otherwise we just create an internal stream for regular run()?

@TedThemistokleous TedThemistokleous changed the title [AIMIGRAPHX-885] Use External Stream Contexts [AIMIGRAPHX-885][AIMIGRAPGX-987] Use External Stream Contexts Apr 24, 2026
@causten
Copy link
Copy Markdown
Collaborator

causten commented Apr 29, 2026

Test Batch New Rate (bcc94e) Old Rate (cdbdbc) Diff Status
resnet50v1_fp16 1 nan 992.67 nan

Regressions detected 🔴

@causten
Copy link
Copy Markdown
Collaborator

causten commented Apr 29, 2026

Test Status Result
bert-mrpc-onnx PASSED: MIGraphX meets tolerance
bert-mrpc-tf PASSED: MIGraphX meets tolerance
pytorch-examples-wlang-gru PASSED: MIGraphX meets tolerance
pytorch-examples-wlang-lstm PASSED: MIGraphX meets tolerance
dlrm-criteoterabyte PASSED: MIGraphX meets tolerance
agentmodel PASSED: MIGraphX meets tolerance
unet PASSED: MIGraphX meets tolerance
resnet50v1 PASSED: MIGraphX meets tolerance
bert_base_cased_fp16 PASSED: MIGraphX meets tolerance
bert_large_uncased_fp16 🔴 FAILED: MIGraphX is not within tolerance - check verbose output
bert_large PASSED: MIGraphX meets tolerance
yolov5s PASSED: MIGraphX meets tolerance
tinyllama PASSED: MIGraphX meets tolerance
vicuna-fastchat PASSED: MIGraphX meets tolerance
whisper-tiny-encoder PASSED: MIGraphX meets tolerance
whisper-tiny-decoder PASSED: MIGraphX meets tolerance
distilgpt2_fp16 PASSED: MIGraphX meets tolerance
llama2_7b PASSED: MIGraphX meets tolerance
qwen1.5-7b PASSED: MIGraphX meets tolerance
phi3-3.8b PASSED: MIGraphX meets tolerance
llama3-8b PASSED: MIGraphX meets tolerance
whisper-large-encoder ERROR - check error output
traceback
Traceback (most recent call last):
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 360, in
main()
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 211, in main
model = migraphx.parse_onnx(model_name, default_dim_value=batch)
RuntimeError: /data/src/include/migraphx/op/convolution.hpp:103: normalize_compute_shape: CONVOLUTION: mismatched channel numbers
whisper-large-decoder PASSED: MIGraphX meets tolerance
mistral-7b PASSED: MIGraphX meets tolerance
FLUX.1-schnell PASSED: MIGraphX meets tolerance

@TedThemistokleous
Copy link
Copy Markdown
Collaborator Author

Added changes based on Paul's comments so that we odn't modify wait_for, finish_on and just use a use_queue and set_queue_context.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants