Skip to content

[ace6tao2] Linux Proactor Improvements#2524

Draft
simpsont-oci wants to merge 3 commits intoDOCGroup:ace6tao2from
simpsont-oci:io_uring_proactor
Draft

[ace6tao2] Linux Proactor Improvements#2524
simpsont-oci wants to merge 3 commits intoDOCGroup:ace6tao2from
simpsont-oci:io_uring_proactor

Conversation

@simpsont-oci
Copy link
Copy Markdown
Contributor

@simpsont-oci simpsont-oci commented Apr 3, 2026

A few things:

  • Improve general test support / coverage for Proactor implementations (backends) by introducing common Proactor_Test_Backend parsing and instantiation support and having all Proactor tests make use of it.
  • Add Proactor Stress test and general API contract test
  • Fix several stability and correctness issues in POSIX Proactor implementations exposed by new test coverage
  • Add new Linux io_uring Proactor implementation (currently seems to outperform all POSIX Proactors on Linux, especially in the new stress test).

Summary by CodeRabbit

Release Notes

  • New Features

    • Added Linux io_uring support as a high-performance asynchronous I/O backend option.
  • Bug Fixes

    • Improved POSIX asynchronous I/O operation reliability and resource cleanup.
    • Enhanced file transmit and datagram read/write operation handling.
    • Fixed socket resource management in asynchronous connect operations.
  • Tests

    • Added proactor contract and stress tests for comprehensive backend validation.
    • Expanded test suite with backend selection support for all proactor implementations.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 3, 2026

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 7.41% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Linux Proactor Improvements' accurately summarizes the main changes, which center on enhancing Linux-based proactor implementations through new io_uring support, test infrastructure, and stability fixes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
ACE/tests/Proactor_Scatter_Gather_Test.cpp (1)

1338-1365: ⚠️ Potential issue | 🟡 Minor

Document the new -t option in print_usage().

The parser accepts backend selection now, but the usage text still omits it, so the error path doesn't tell users how to invoke the new option.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ACE/tests/Proactor_Scatter_Gather_Test.cpp` around lines 1338 - 1365, The
usage text is missing documentation for the new -t option; update the
print_usage() function to include a concise description of "-t <backend>" (or
similar) and explain accepted backend values (as parsed by
Proactor_Test_Backend::parse_type) so users see how to select a backend when the
parser hits the error path; add the flag and a short example/valid values list
to the existing usage string returned/printed by print_usage().
🧹 Nitpick comments (2)
ACE/ace/Uring_Asynch_IO.h (1)

145-148: Consider ACE container alternatives to std::set.

Using std::set for pending_results_ is functional, but ACE traditionally uses its own containers like ACE_Unbounded_Set for consistency. However, std::set's O(log n) operations for insert/erase/find may be preferable here for tracking pending async operations.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ACE/ace/Uring_Asynch_IO.h` around lines 145 - 148, The declaration of
pending_results_ uses std::set<ACE_Uring_Asynch_Result*> which diverges from ACE
container conventions; replace it with
ACE_Unbounded_Set<ACE_Uring_Asynch_Result*> (and add the corresponding `#include`
<ace/Unbounded_Set.h>) and update all usages of pending_results_
(insert/erase/find iterations) to the ACE_Unbounded_Set API (e.g., insert,
remove, find/seek as appropriate), or if you intentionally need ordered/O(log n)
semantics retain std::set but add a comment near pending_results_ and in
ACE_Uring_Asynch_Result usage explaining the performance/order requirement to
justify keeping the STL container.
ACE/ace/Uring_Asynch_IO.cpp (1)

51-55: Consider adding null-safety to handler() method.

If handler_proxy_ is ever an empty Proxy_Ptr, calling get() returns nullptr and the subsequent ->handler() dereference would crash. While this may follow existing ACE patterns assuming valid proxies, adding a defensive check would improve robustness.

Suggested defensive check
 ACE_Handler *
 ACE_Uring_Asynch_Result::handler (void) const
 {
-  return this->handler_proxy_.get ()->handler ();
+  ACE_Handler::Proxy *proxy = this->handler_proxy_.get ();
+  return proxy ? proxy->handler () : 0;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ACE/ace/Uring_Asynch_IO.cpp` around lines 51 - 55, The
ACE_Uring_Asynch_Result::handler() currently dereferences
this->handler_proxy_.get() without checking for null; modify
ACE_Uring_Asynch_Result::handler() to first obtain a pointer from
handler_proxy_.get(), check if it's null, and if so return nullptr (or an
appropriate default), otherwise call and return ptr->handler(); reference the
method ACE_Uring_Asynch_Result::handler and the member handler_proxy_ (and its
get()) when locating the code to update.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ACE/ace/POSIX_Proactor.cpp`:
- Around line 1083-1085: The no-arg handle_events() was changed to use a 1000ms
timeout causing busy polling; restore its original blocking behavior so
ACE_Proactor::proactor_run_event_loop() sleeps until work arrives. Locate the
call to handle_events_i in ACE_Proactor::handle_events() (currently invoked as
handle_events_i(1000)) and change it to use the blocking/infinite-wait variant
(i.e. pass the infinite/blocking timeout or call the blocking overload) so idle
proactor threads do not poll for work.
- Around line 1093-1103: post_completion added lazy init via
ensure_notify_manager(), but cancel_aio() and start_deferred_aio() still call
putq_result() directly so notify_completion() can return success without writing
to the pipe; ensure the notify-manager is created before any enqueue. Modify
cancel_aio() and start_deferred_aio() to call ensure_notify_manager() and handle
non-zero failure before calling putq_result(), or move the
ensure_notify_manager() call into putq_result()/enqueue path so
aiocb_notify_pipe_manager_ is guaranteed initialized whenever a result is
queued; keep notify_completion() behavior consistent and return error if
ensure_notify_manager() fails.
- Around line 545-551: The change replaces the real handler proxy with a
null_handler_proxy and turns the POSIX wakeup completion into a no-op, which
prevents dispatching ACE_Handler::handle_wakeup() and diverges from the WIN32
behavior; restore the original proxy usage by constructing
ACE_POSIX_Wakeup_Completion with the actual handler proxy (not
null_handler_proxy) so wakeup_completion will invoke handle_wakeup as before
(update the same pattern at the other occurrence noted around lines 2119-2121),
i.e., pass the real ACE_Handler::Proxy_Ptr instance used for the target handler
into ACE_POSIX_Wakeup_Completion instead of the null proxy.

In `@ACE/ace/Uring_Asynch_IO.cpp`:
- Around line 1172-1185: The send path sets result->msg()->msg_name to
remote_addr.get_addr() which can point at a temporary/stack buffer and cause a
use-after-free when the io_uring operation completes; modify
ACE_Uring_Asynch_Write_Dgram_Result to own a sockaddr buffer (e.g. add a member
like remote_addr_ and remote_addr_len_), copy remote_addr.get_addr() into that
buffer in the ACE_Uring_Asynch_Write_Dgram_Result constructor, and in
ACE_Uring_Asynch_Write_Dgram::send set result->msg()->msg_name =
result->remote_addr_ and result->msg()->msg_namelen = result->remote_addr_len_
before submitting with submit_result so the async io_uring sendmsg uses the
owned copy instead of the caller's buffer.
- Around line 952-996: The error log prefixes incorrectly reference
ACE_POSIX_Asynch_Connect::connect_i; update the ACELIB_ERROR messages in this
Uring implementation to use ACE_Uring_Asynch_Connect::connect_i instead —
specifically change the three occurrences around the setsockopt (near the
result->set_error(errno) after setsockopt), the bind (after ACE_OS::bind
failure), and the set_flags (after ACE::set_flags failure) so the log lines
reflect the ACE_Uring_Asynch_Connect::connect_i prefix while keeping the
existing message format and errno handling intact.

In `@ACE/ace/Uring_Proactor.cpp`:
- Around line 65-75: The two ACE_Uring_Proactor::handle_events overloads
currently return the raw batch size from process_cqes (e.g., 2..32) which breaks
the documented contract; change both handle_events() and the other call sites
that directly return process_cqes (e.g., the variants at the other noted
locations) to map the process_cqes return value into the documented semantic:
return 1 if process_cqes returns a positive count, 0 if it returns 0 (no
completions), and propagate -1 on error; locate the calls to process_cqes in
ACE_Uring_Proactor::handle_events(Time_Value&),
ACE_Uring_Proactor::handle_events(), and the other occurrences mentioned and
replace the direct return with this normalized mapping so callers receive only
1/0/-1.
- Around line 27-38: The init path leaves a partially initialized
ACE_Uring_Proactor when ::io_uring_queue_init fails; ensure the object is not
usable by either cleaning up and marking it fully uninitialized or by failing
construction: when io_uring_queue_init returns <0, call the appropriate cleanup
for ring_ (e.g., io_uring_queue_exit or equivalent), clear/zero ring_, set
is_initialized_ to false, and propagate the failure (return an error/throw) so
callers (which call process_cqes(), post_wakeup_completions(), etc.) never
receive a half-initialized instance; update
ACE_Uring_Proactor::io_uring_queue_init error path and any factory/creation code
to handle the propagated failure.
- Around line 116-177: The code incorrectly treats two completions from the same
handler as a duplicate and returns before calling io_uring_cqe_seen(), which can
leave a CQE at the head of the ring and starve later completions; change the
duplicate-detection to track ACE_Uring_Asynch_Result* (the actual pending key)
instead of result->handler(), and ensure io_uring_cqe_seen(&this->ring_, cqe) is
always called before any early return or continue. Concretely: replace
dispatched_handlers (std::set<const void *>) with a set of
ACE_Uring_Asynch_Result* (or equivalent name), check
dispatched_results.find(result) instead of dispatched_handlers.find(handler),
call io_uring_cqe_seen(&this->ring_, cqe) before skipping/returning, and remove
the handler-based logic that returns when the same handler is seen so that
multiple CQEs for the same handler are treated as independent completions.

In `@ACE/include/makeinclude/platform_linux.GNU`:
- Around line 72-74: When $(uring) is 1 the makefile currently adds -luring but
does not define the ACE_HAS_IO_URING macro required by guarded code (e.g., files
Uring_Proactor.h and Uring_Asynch_IO.h); modify the conditional that appends
LIBS so that it also adds a preprocessor definition (e.g., append
-DACE_HAS_IO_URING to the compiler flags such as CPPFLAGS or CXXFLAGS) inside
the same ifeq ($(uring),1) block so the guarded symbols are compiled when
linking liburing.

In `@ACE/tests/Proactor_Scatter_Gather_Test.cpp`:
- Around line 875-879: The extra call to initiate_write_file() in the
final-flush branch can fail and leave the writer alive with receiver_count_ == 0
and io_count_ == 0; after calling initiate_write_file() you must check its
return value and, on failure, explicitly perform the same completion/cleanup
path so the writer can terminate. Modify the branch that contains the
initiate_write_file() call to capture its return value and, if non-zero, invoke
the object's completion/cleanup helper (e.g., call the existing
handle_write_complete() or the code path that decrements
receiver_count_/io_count_ and triggers shutdown) so the event loop is not left
without a terminator.

In `@ACE/tests/Proactor_Stress_Test.cpp`:
- Around line 177-190: schedule_one() currently increments pending_ after
calling proactor_.schedule_timer(), which lets a zero-delay timer fire before
bookkeeping and causes wait_for_idle() races; move the bookkeeping ahead of
arming the timer: under the existing lock (ACE_GUARD_RETURN on this->lock_),
increment this->pending_ before calling this->proactor_.schedule_timer(...),
then call schedule_timer; if schedule_timer returns -1, decrement this->pending_
and increment this->schedule_failures_ to roll back, otherwise return success as
before; references: schedule_one(), proactor_.schedule_timer(), pending_,
schedule_failures_, wait_for_idle().

In `@ACE/tests/Proactor_Test_IPV6.cpp`:
- Around line 1733-1735: The current set_proactor_type uses only
Proactor_Test_Backend::parse_type(ptype, proactor_type) which merely parses
names and can allow selecting a backend that isn't build-available, causing
task1.start() to fail and run_main() to hang; restore the previous availability
gate here by checking the backend availability after parsing (e.g., call the
backend availability helper for proactor_type or
Proactor_Test_Backend::is_available/proactor_available(proactor_type)) and
return false (or perform a hard exit) when the chosen backend cannot be
instantiated so tests fail fast instead of waiting forever.

In `@ACE/tests/Proactor_UDP_Test.cpp`:
- Around line 1955-1957: set_proactor_type currently calls
Proactor_Test_Backend::parse_type(ptype, proactor_type) which only validates the
token and allows platform-incompatible backends through; update
set_proactor_type to also check backend availability (reuse the previous
availability check in Proactor_Test_Backend or call a new
is_available()/validate_availability method) and return false if the backend is
unsupported so the test fails fast, or alternatively ensure run_main() detects
startup failure (e.g., task1.start() failure) and returns immediately; reference
Proactor_Test_Backend::parse_type, set_proactor_type, run_main, and task1.start
when applying the change.

---

Outside diff comments:
In `@ACE/tests/Proactor_Scatter_Gather_Test.cpp`:
- Around line 1338-1365: The usage text is missing documentation for the new -t
option; update the print_usage() function to include a concise description of
"-t <backend>" (or similar) and explain accepted backend values (as parsed by
Proactor_Test_Backend::parse_type) so users see how to select a backend when the
parser hits the error path; add the flag and a short example/valid values list
to the existing usage string returned/printed by print_usage().

---

Nitpick comments:
In `@ACE/ace/Uring_Asynch_IO.cpp`:
- Around line 51-55: The ACE_Uring_Asynch_Result::handler() currently
dereferences this->handler_proxy_.get() without checking for null; modify
ACE_Uring_Asynch_Result::handler() to first obtain a pointer from
handler_proxy_.get(), check if it's null, and if so return nullptr (or an
appropriate default), otherwise call and return ptr->handler(); reference the
method ACE_Uring_Asynch_Result::handler and the member handler_proxy_ (and its
get()) when locating the code to update.

In `@ACE/ace/Uring_Asynch_IO.h`:
- Around line 145-148: The declaration of pending_results_ uses
std::set<ACE_Uring_Asynch_Result*> which diverges from ACE container
conventions; replace it with ACE_Unbounded_Set<ACE_Uring_Asynch_Result*> (and
add the corresponding `#include` <ace/Unbounded_Set.h>) and update all usages of
pending_results_ (insert/erase/find iterations) to the ACE_Unbounded_Set API
(e.g., insert, remove, find/seek as appropriate), or if you intentionally need
ordered/O(log n) semantics retain std::set but add a comment near
pending_results_ and in ACE_Uring_Asynch_Result usage explaining the
performance/order requirement to justify keeping the STL container.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 60d270fc-a2dd-4cab-b7ad-bac479ebb317

📥 Commits

Reviewing files that changed from the base of the PR and between 98e5a3f and c840f6b.

📒 Files selected for processing (23)
  • ACE/ace/Asynch_IO.h
  • ACE/ace/POSIX_Asynch_IO.cpp
  • ACE/ace/POSIX_Proactor.cpp
  • ACE/ace/POSIX_Proactor.h
  • ACE/ace/POSIX_Proactor.inl
  • ACE/ace/Proactor.cpp
  • ACE/ace/Uring_Asynch_IO.cpp
  • ACE/ace/Uring_Asynch_IO.h
  • ACE/ace/Uring_Proactor.cpp
  • ACE/ace/Uring_Proactor.h
  • ACE/ace/ace.mpc
  • ACE/include/makeinclude/platform_linux.GNU
  • ACE/tests/Proactor_Contract_Test.cpp
  • ACE/tests/Proactor_File_Test.cpp
  • ACE/tests/Proactor_Scatter_Gather_Test.cpp
  • ACE/tests/Proactor_Stress_Test.cpp
  • ACE/tests/Proactor_Test.cpp
  • ACE/tests/Proactor_Test_Backend.h
  • ACE/tests/Proactor_Test_IPV6.cpp
  • ACE/tests/Proactor_Timer_Test.cpp
  • ACE/tests/Proactor_UDP_Test.cpp
  • ACE/tests/run_test.lst
  • ACE/tests/tests.mpc

Comment on lines +545 to +551
ACE_Handler::Proxy_Ptr null_handler_proxy;

for (int ci = 0; ci < how_many; ci++)
{
ACE_NEW_RETURN
(wakeup_completion,
ACE_POSIX_Wakeup_Completion (this->wakeup_handler_.proxy ()),
ACE_POSIX_Wakeup_Completion (null_handler_proxy),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't silently drop the POSIX wakeup handler.

These changes replace the previous proxy with a null proxy and then turn wakeup completion into a no-op. That means the POSIX backend no longer dispatches ACE_Handler::handle_wakeup() at all, which is an observable contract change and now diverges from the WIN32 path.

Also applies to: 2119-2121

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ACE/ace/POSIX_Proactor.cpp` around lines 545 - 551, The change replaces the
real handler proxy with a null_handler_proxy and turns the POSIX wakeup
completion into a no-op, which prevents dispatching ACE_Handler::handle_wakeup()
and diverges from the WIN32 behavior; restore the original proxy usage by
constructing ACE_POSIX_Wakeup_Completion with the actual handler proxy (not
null_handler_proxy) so wakeup_completion will invoke handle_wakeup as before
(update the same pattern at the other occurrence noted around lines 2119-2121),
i.e., pass the real ACE_Handler::Proxy_Ptr instance used for the target handler
into ACE_POSIX_Wakeup_Completion instead of the null proxy.

Comment on lines +1083 to +1085
// Avoid an unbounded wait here so the event loop can observe
// end_event_loop() even if the notify-pipe wakeup is lost.
return this->handle_events_i (1000);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Keep the no-arg handle_events() blocking.

ACE_Proactor::proactor_run_event_loop() repeatedly calls this overload expecting it to sleep until work arrives. Changing it to a 1-second timeout turns idle proactor threads into a polling loop and papers over lost-wakeup bugs instead of fixing them.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ACE/ace/POSIX_Proactor.cpp` around lines 1083 - 1085, The no-arg
handle_events() was changed to use a 1000ms timeout causing busy polling;
restore its original blocking behavior so
ACE_Proactor::proactor_run_event_loop() sleeps until work arrives. Locate the
call to handle_events_i in ACE_Proactor::handle_events() (currently invoked as
handle_events_i(1000)) and change it to use the blocking/infinite-wait variant
(i.e. pass the infinite/blocking timeout or call the blocking overload) so idle
proactor threads do not poll for work.

Comment on lines +1093 to +1103
if (this->aiocb_notify_pipe_manager_ == 0)
return 0;

return this->aiocb_notify_pipe_manager_->notify ();
}

int
ACE_POSIX_AIOCB_Proactor::post_completion (ACE_POSIX_Asynch_Result *result)
{
if (this->ensure_notify_manager () != 0)
return -1;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Lazy notify-manager setup doesn't cover every queued-completion path.

post_completion() now initializes the pipe, but cancel_aio() and start_deferred_aio() still enqueue directly through putq_result(). If either path runs before the manager exists, notify_completion() returns success at Line 1093 without writing to the pipe, so the queued completion only surfaces because of the new polling fallback.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ACE/ace/POSIX_Proactor.cpp` around lines 1093 - 1103, post_completion added
lazy init via ensure_notify_manager(), but cancel_aio() and start_deferred_aio()
still call putq_result() directly so notify_completion() can return success
without writing to the pipe; ensure the notify-manager is created before any
enqueue. Modify cancel_aio() and start_deferred_aio() to call
ensure_notify_manager() and handle non-zero failure before calling
putq_result(), or move the ensure_notify_manager() call into
putq_result()/enqueue path so aiocb_notify_pipe_manager_ is guaranteed
initialized whenever a result is queued; keep notify_completion() behavior
consistent and return error if ensure_notify_manager() fails.

Comment on lines +952 to +996
ACELIB_ERROR ((LM_ERROR,
ACE_TEXT("ACE_POSIX_Asynch_Connect::connect_i: %p\n"),
ACE_TEXT("socket")));
}
else
{
// Reuse the address
int one = 1;
if (protocol_family != PF_UNIX &&
reuse_addr != 0 &&
ACE_OS::setsockopt (handle,
SOL_SOCKET,
SO_REUSEADDR,
(const char*) &one,
sizeof one) == -1 )
{
result->set_error (errno);
ACELIB_ERROR ((LM_ERROR,
ACE_TEXT("ACE_POSIX_Asynch_Connect::connect_i: %p\n"),
ACE_TEXT("setsockopt")));
}
}
}

if (result->error () == 0 && local_sap != ACE_Addr::sap_any)
{
sockaddr * laddr = reinterpret_cast<sockaddr *> (local_sap.get_addr ());
size_t size = local_sap.get_size ();

if (ACE_OS::bind (handle, laddr, size) == -1)
{
result->set_error (errno);
ACELIB_ERROR ((LM_ERROR,
ACE_TEXT("ACE_POSIX_Asynch_Connect::connect_i: %p\n"),
ACE_TEXT("bind")));
}
}

// set non blocking mode
if (result->error () == 0 && ACE::set_flags (handle, ACE_NONBLOCK) != 0)
{
result->set_error (errno);
ACELIB_ERROR ((LM_ERROR,
ACE_TEXT("ACE_POSIX_Asynch_Connect::connect_i: %p\n")
ACE_TEXT("set_flags")));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Incorrect error message prefix: should reference Uring, not POSIX.

These error messages were likely copied from the POSIX implementation but retain the wrong class name prefix. This will mislead developers during debugging.

Proposed fix
-              ACELIB_ERROR ((LM_ERROR,
-                         ACE_TEXT("ACE_POSIX_Asynch_Connect::connect_i: %p\n"),
-                         ACE_TEXT("socket")));
+              ACELIB_ERROR ((LM_ERROR,
+                         ACE_TEXT("ACE_Uring_Asynch_Connect::connect: %p\n"),
+                         ACE_TEXT("socket")));

Apply similar changes to lines 969-971 (setsockopt), 984-986 (bind), and 994-996 (set_flags).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ACE/ace/Uring_Asynch_IO.cpp` around lines 952 - 996, The error log prefixes
incorrectly reference ACE_POSIX_Asynch_Connect::connect_i; update the
ACELIB_ERROR messages in this Uring implementation to use
ACE_Uring_Asynch_Connect::connect_i instead — specifically change the three
occurrences around the setsockopt (near the result->set_error(errno) after
setsockopt), the bind (after ACE_OS::bind failure), and the set_flags (after
ACE::set_flags failure) so the log lines reflect the
ACE_Uring_Asynch_Connect::connect_i prefix while keeping the existing message
format and errno handling intact.

Comment on lines +1172 to +1185
ssize_t ACE_Uring_Asynch_Write_Dgram::send (ACE_Message_Block *message_block, size_t &/*number_of_bytes_sent*/, int flags, const ACE_Addr &remote_addr, const void *act, int, int)
{
ACE_Uring_Asynch_Write_Dgram_Result *result = 0;
ACE_NEW_RETURN (result, ACE_Uring_Asynch_Write_Dgram_Result (this->handler_proxy_, this->handle_, message_block, message_block->length (), flags, act, this->proactor_), -1);
result->msg ()->msg_name = (void *)remote_addr.get_addr ();
result->msg ()->msg_namelen = (socklen_t)remote_addr.get_size ();

ACE_GUARD_RETURN (ACE_Thread_Mutex, ace_mon, this->uring_proactor_->sq_mutex (), -1);
struct io_uring_sqe *sqe = this->uring_proactor_->get_sqe ();
if (!sqe) { delete result; errno = EAGAIN; return -1; }
::io_uring_prep_sendmsg (sqe, this->handle_, result->msg (), flags);
::io_uring_sqe_set_data (sqe, result);
return this->submit_result (result);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Potential use-after-free: remote_addr pointer may become invalid before async completion.

The msg_name pointer is set to point directly at the caller's remote_addr.get_addr() buffer. Since send() returns immediately while the actual sendmsg operation executes asynchronously via io_uring, if remote_addr is a stack variable or temporary, it may be destroyed before the kernel reads the address.

Compare with ACE_Uring_Asynch_Read_Dgram_Result which correctly stores remote_addr_ as a member field (line 1052). The write result should similarly copy the destination address into an internal buffer.

Proposed fix

Add a member to store the address in ACE_Uring_Asynch_Write_Dgram_Result:

 // In header file, add member:
+  struct sockaddr_storage remote_addr_;

 // In constructor or send():
-  result->msg ()->msg_name = (void *)remote_addr.get_addr ();
-  result->msg ()->msg_namelen = (socklen_t)remote_addr.get_size ();
+  ACE_OS::memcpy (&result->remote_addr_, remote_addr.get_addr (), remote_addr.get_size ());
+  result->msg ()->msg_name = &result->remote_addr_;
+  result->msg ()->msg_namelen = (socklen_t)remote_addr.get_size ();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ACE/ace/Uring_Asynch_IO.cpp` around lines 1172 - 1185, The send path sets
result->msg()->msg_name to remote_addr.get_addr() which can point at a
temporary/stack buffer and cause a use-after-free when the io_uring operation
completes; modify ACE_Uring_Asynch_Write_Dgram_Result to own a sockaddr buffer
(e.g. add a member like remote_addr_ and remote_addr_len_), copy
remote_addr.get_addr() into that buffer in the
ACE_Uring_Asynch_Write_Dgram_Result constructor, and in
ACE_Uring_Asynch_Write_Dgram::send set result->msg()->msg_name =
result->remote_addr_ and result->msg()->msg_namelen = result->remote_addr_len_
before submitting with submit_result so the async io_uring sendmsg uses the
owned copy instead of the caller's buffer.

Comment on lines +72 to +74
ifeq ($(uring),1)
LIBS += -luring
endif
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if ACE_HAS_IO_URING is defined elsewhere in the build system
rg -n 'ACE_HAS_IO_URING' ACE/include/makeinclude/ ACE/ace/config*.h 2>/dev/null || echo "Not found in makeinclude or config headers"

Repository: DOCGroup/ACE_TAO

Length of output: 102


🏁 Script executed:

# Find Uring_Proactor.h and check its contents
fd -name '*[Uu]ring*' ACE/

Repository: DOCGroup/ACE_TAO

Length of output: 290


🏁 Script executed:

# Search entire codebase for ACE_HAS_IO_URING
rg 'ACE_HAS_IO_URING' --no-heading

Repository: DOCGroup/ACE_TAO

Length of output: 1266


🏁 Script executed:

# Check the context around lines 72-74 in platform_linux.GNU
sed -n '60,90p' ACE/include/makeinclude/platform_linux.GNU

Repository: DOCGroup/ACE_TAO

Length of output: 666


🏁 Script executed:

# Check if similar patterns in platform_linux.GNU define preprocessor flags
rg -A2 'ifeq.*\$\(' ACE/include/makeinclude/platform_linux.GNU | head -40

Repository: DOCGroup/ACE_TAO

Length of output: 1028


🏁 Script executed:

# Search all files for where ACE_HAS_IO_URING is DEFINED (not just used)
rg 'define.*ACE_HAS_IO_URING|ACE_HAS_IO_URING.*=' --no-heading

Repository: DOCGroup/ACE_TAO

Length of output: 701


🏁 Script executed:

# Check if there's a configure script or other build scripts that might define it
find ACE -name 'configure*' -o -name '*.m4' -o -name 'config.h*' | head -20

Repository: DOCGroup/ACE_TAO

Length of output: 263


🏁 Script executed:

# Look at the broader context: is this a new feature being added? Check git history if available
git log --oneline -20 -- ACE/include/makeinclude/platform_linux.GNU 2>/dev/null || echo "Git history not available"

Repository: DOCGroup/ACE_TAO

Length of output: 84


🏁 Script executed:

# Check if ACE_HAS_IO_URING might be defined in generated headers or cmake files
find . -name 'config.h' -o -name 'CMakeLists.txt' -o -name '*.in' | grep -i config | head -10

Repository: DOCGroup/ACE_TAO

Length of output: 226


🏁 Script executed:

# Search for how other ACE_HAS_* macros are defined to understand the pattern
rg 'define.*ACE_HAS_AIO_CALLS' --no-heading

Repository: DOCGroup/ACE_TAO

Length of output: 11059


🏁 Script executed:

# Check the ACE config header patterns
fd 'config' ACE/ace | head -10

Repository: DOCGroup/ACE_TAO

Length of output: 340


🏁 Script executed:

# See if there's documentation about the uring feature
fd -i uring ACE | grep -E '\.(md|txt|rst|doc)$'

Repository: DOCGroup/ACE_TAO

Length of output: 42


🏁 Script executed:

# Check config-linux.h to see if ACE_HAS_IO_URING might be defined there
cat ACE/ace/config-linux.h | grep -A5 -B5 'IO_URING\|uring' || echo "Not found in config-linux.h"

Repository: DOCGroup/ACE_TAO

Length of output: 87


🏁 Script executed:

# Check all config-*.h files for any reference to uring or IO_URING
rg 'uring|IO_URING' ACE/ace/config-*.h

Repository: DOCGroup/ACE_TAO

Length of output: 283


🏁 Script executed:

# Check if there's any cmake or build configuration that might set this
fd 'CMakeLists.txt|MPC|*.cmake' ACE | xargs grep -l uring 2>/dev/null || echo "No cmake files with uring"

Repository: DOCGroup/ACE_TAO

Length of output: 427


🏁 Script executed:

# Final check: search the entire codebase for where ACE_HAS_IO_URING is defined with `#define`
rg '#define.*ACE_HAS_IO_URING' --no-heading

Repository: DOCGroup/ACE_TAO

Length of output: 42


Missing ACE_HAS_IO_URING preprocessor definition in makefile conditional.

The makefile links -luring when $(uring)=1, but doesn't define the ACE_HAS_IO_URING preprocessor macro. Code in Uring_Proactor.h, Uring_Asynch_IO.h, and other files is guarded by #if defined (ACE_HAS_AIO_CALLS) && defined (ACE_HAS_IO_URING), so without the macro definition, this code won't compile even with the library linked.

Proposed fix to add preprocessor definition
 ifeq ($(uring),1)
+  CPPFLAGS += -DACE_HAS_IO_URING
   LIBS += -luring
 endif
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
ifeq ($(uring),1)
LIBS += -luring
endif
ifeq ($(uring),1)
CPPFLAGS += -DACE_HAS_IO_URING
LIBS += -luring
endif
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ACE/include/makeinclude/platform_linux.GNU` around lines 72 - 74, When
$(uring) is 1 the makefile currently adds -luring but does not define the
ACE_HAS_IO_URING macro required by guarded code (e.g., files Uring_Proactor.h
and Uring_Asynch_IO.h); modify the conditional that appends LIBS so that it also
adds a preprocessor definition (e.g., append -DACE_HAS_IO_URING to the compiler
flags such as CPPFLAGS or CXXFLAGS) inside the same ifeq ($(uring),1) block so
the guarded symbols are compiled when linking liburing.

Comment on lines +875 to +879
if (0 != this->odd_chain_ || 0 != this->even_chain_)
{
this->initiate_write_file ();
return;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Handle a failed final flush explicitly.

If this extra initiate_write_file() call fails, the callback returns with receiver_count_ == 0, io_count_ == 0, and no more completions pending. That leaves the writer alive and the event loop with nothing left to terminate it.

Suggested fix
       if (0 != this->odd_chain_ || 0 != this->even_chain_)
         {
-          this->initiate_write_file ();
+          if (this->initiate_write_file () != 0)
+            {
+              ACE_Proactor::instance ()->end_event_loop ();
+              delete this;
+            }
           return;
         }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (0 != this->odd_chain_ || 0 != this->even_chain_)
{
this->initiate_write_file ();
return;
}
if (0 != this->odd_chain_ || 0 != this->even_chain_)
{
if (this->initiate_write_file () != 0)
{
ACE_Proactor::instance ()->end_event_loop ();
delete this;
}
return;
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ACE/tests/Proactor_Scatter_Gather_Test.cpp` around lines 875 - 879, The extra
call to initiate_write_file() in the final-flush branch can fail and leave the
writer alive with receiver_count_ == 0 and io_count_ == 0; after calling
initiate_write_file() you must check its return value and, on failure,
explicitly perform the same completion/cleanup path so the writer can terminate.
Modify the branch that contains the initiate_write_file() call to capture its
return value and, if non-zero, invoke the object's completion/cleanup helper
(e.g., call the existing handle_write_complete() or the code path that
decrements receiver_count_/io_count_ and triggers shutdown) so the event loop is
not left without a terminator.

Comment on lines +177 to +190
int schedule_one ()
{
long timer_id = this->proactor_.schedule_timer (*this,
0,
ACE_Time_Value::zero);
ACE_GUARD_RETURN (ACE_Thread_Mutex, guard, this->lock_, -1);
if (timer_id == -1)
{
++this->schedule_failures_;
return -1;
}

++this->pending_;
return 0;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Account for the timer before you arm it.

With zero-delay timers and 8 dispatch threads, the callback can run before the later ++pending_. That makes pending_ drift from the real outstanding count and turns wait_for_idle() into a flaky race.

💡 Move the bookkeeping ahead of `schedule_timer()`
 int schedule_one ()
 {
-  long timer_id = this->proactor_.schedule_timer (*this,
-                                                  0,
-                                                  ACE_Time_Value::zero);
-  ACE_GUARD_RETURN (ACE_Thread_Mutex, guard, this->lock_, -1);
+  {
+    ACE_GUARD_RETURN (ACE_Thread_Mutex, guard, this->lock_, -1);
+    ++this->pending_;
+  }
+
+  long timer_id = this->proactor_.schedule_timer (*this,
+                                                  0,
+                                                  ACE_Time_Value::zero);
   if (timer_id == -1)
     {
+      ACE_GUARD_RETURN (ACE_Thread_Mutex, guard, this->lock_, -1);
+      --this->pending_;
       ++this->schedule_failures_;
+      if (this->dispatch_scale_ == 0 && this->pending_ == 0)
+        this->idle_.broadcast ();
       return -1;
     }
 
-  ++this->pending_;
   return 0;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ACE/tests/Proactor_Stress_Test.cpp` around lines 177 - 190, schedule_one()
currently increments pending_ after calling proactor_.schedule_timer(), which
lets a zero-delay timer fire before bookkeeping and causes wait_for_idle()
races; move the bookkeeping ahead of arming the timer: under the existing lock
(ACE_GUARD_RETURN on this->lock_), increment this->pending_ before calling
this->proactor_.schedule_timer(...), then call schedule_timer; if schedule_timer
returns -1, decrement this->pending_ and increment this->schedule_failures_ to
roll back, otherwise return success as before; references: schedule_one(),
proactor_.schedule_timer(), pending_, schedule_failures_, wait_for_idle().

Comment on lines 1733 to +1735
set_proactor_type (const ACE_TCHAR *ptype)
{
if (!ptype)
return 0;

switch (ACE_OS::ace_toupper (*ptype))
{
case 'D':
proactor_type = DEFAULT;
return 1;
case 'A':
proactor_type = AIOCB;
return 1;
case 'I':
proactor_type = SIG;
return 1;
#if defined (sun)
case 'S':
proactor_type = SUN;
return 1;
#endif /* sun */
#if !defined (ACE_HAS_BROKEN_SIGEVENT_STRUCT)
case 'C':
proactor_type = CB;
return 1;
#endif /* !ACE_HAS_BROKEN_SIGEVENT_STRUCT */
default:
break;
}
return 0;
return Proactor_Test_Backend::parse_type (ptype, proactor_type) == 0;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Keep backend availability validation in this path.

parse_type() only parses names; it doesn't reject backends that this build can't instantiate. In this test that means task1.start() can fail and run_main() still waits forever for sessions that will never start. Please keep the old availability gate here, or add a hard exit on startup failure.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ACE/tests/Proactor_Test_IPV6.cpp` around lines 1733 - 1735, The current
set_proactor_type uses only Proactor_Test_Backend::parse_type(ptype,
proactor_type) which merely parses names and can allow selecting a backend that
isn't build-available, causing task1.start() to fail and run_main() to hang;
restore the previous availability gate here by checking the backend availability
after parsing (e.g., call the backend availability helper for proactor_type or
Proactor_Test_Backend::is_available/proactor_available(proactor_type)) and
return false (or perform a hard exit) when the chosen backend cannot be
instantiated so tests fail fast instead of waiting forever.

Comment on lines 1955 to +1957
set_proactor_type (const ACE_TCHAR *ptype)
{
if (!ptype)
return 0;

switch (ACE_OS::ace_toupper (*ptype))
{
case 'D':
proactor_type = DEFAULT;
return 1;
case 'A':
proactor_type = AIOCB;
return 1;
case 'I':
proactor_type = SIG;
return 1;
#if defined (sun)
case 'S':
proactor_type = SUN;
return 1;
#endif /* sun */
#if !defined (ACE_HAS_BROKEN_SIGEVENT_STRUCT)
case 'C':
proactor_type = CB;
return 1;
#endif /* !ACE_HAS_BROKEN_SIGEVENT_STRUCT */
default:
break;
}
return 0;
return Proactor_Test_Backend::parse_type (ptype, proactor_type) == 0;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't accept unsupported backends here.

parse_type() only validates the token. It now lets platform-incompatible backends through, and when task1.start() later fails this test still drops into the while (!test.testing_done()) wait with zero sessions, so the process can hang instead of failing fast. Please keep the old availability check here, or make run_main() return immediately on startup failure.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ACE/tests/Proactor_UDP_Test.cpp` around lines 1955 - 1957, set_proactor_type
currently calls Proactor_Test_Backend::parse_type(ptype, proactor_type) which
only validates the token and allows platform-incompatible backends through;
update set_proactor_type to also check backend availability (reuse the previous
availability check in Proactor_Test_Backend or call a new
is_available()/validate_availability method) and return false if the backend is
unsupported so the test fails fast, or alternatively ensure run_main() detects
startup failure (e.g., task1.start() failure) and returns immediately; reference
Proactor_Test_Backend::parse_type, set_proactor_type, run_main, and task1.start
when applying the change.

int shutting_down_;

/// Serialize access to the shutdown flag between destroy() and svc().
ACE_Thread_Mutex shutting_down_lock_;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not a atomic?

@jwillemsen
Copy link
Copy Markdown
Member

jwillemsen commented Apr 8, 2026

Does this fixed any of the open proactor bugzilla issues? Is #2479 something related?

@jwillemsen jwillemsen changed the title Linux Proactor Improvements [ace6tao2] Linux Proactor Improvements Apr 8, 2026
Copy link
Copy Markdown
Member

@jwillemsen jwillemsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did just a quick scan, not a detailed review, for master there are several places where C++17 features should be used

ACE_Select_Reactor select_reactor_;

ACE_Reactor reactor_;
ACE_Atomic_Op<ACE_SYNCH_MUTEX, int> started_;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bool?

{
ACE_SEH_TRY
{
this->result_->complete (this->bytes_transferred_,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just all in one line?


for (ai = 0; ai < aiocb_list_max_size_; ai++)
{
if (this->aiocb_list_[ai] == 0 ) // not active operation
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space


if (flg_completed == 0) // not completed
{
num_pending++;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefix increment

aiocb_list_[i] = result;
num_deferred_aiocb_ --;
this->aiocb_list_[deferred_index] = result;
this->num_deferred_aiocb_ --;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefix decrement

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants