Skip to content

feat: (Python) Add async context manager#487

Open
qzyu999 wants to merge 2 commits intoapache:mainfrom
qzyu999:feat/456-python-async-context-manager
Open

feat: (Python) Add async context manager#487
qzyu999 wants to merge 2 commits intoapache:mainfrom
qzyu999:feat/456-python-async-context-manager

Conversation

@qzyu999
Copy link
Copy Markdown
Contributor

@qzyu999 qzyu999 commented Apr 10, 2026

Purpose

Linked issue: close #456

To implement asynchronous context managers (async with) for AppendWriter, UpsertWriter, and LogScanner in the Python bindings, ensuring proper resource lifecycle management and automated, non-blocking flushing of records.

Brief change log

  • Writer Context Managers (AppendWriter, UpsertWriter): Implemented __aenter__ and __aexit__ protocols.
    • Happy Path: Automatically awaits flush() on normal exit, guaranteeing data delivery before releasing the context.
    • Exception Path: Bypasses flush() to instantly free the Python asyncio event loop (fail-fast semantics). Note: Because the RecordAccumulator is a shared resource on the connection, this relies on a "best-effort" non-blocking design. It avoids calling close() or abort() to prevent permanently bricking the shared MemoryLimiter, meaning records appended prior to the exception may still be transmitted by the background Tokio thread.
  • Scanner Context Managers (LogScanner, RecordBatchLogScanner): Implemented __aenter__ and __aexit__ to establish the API contract for asynchronous resource reclamation.
  • Documentation & Type Hints: Updated __init__.pyi docstrings to clearly document the best-effort transactional semantics so developers understand the limits of client-side atomicity.
  • Test Stability: Fixed a micro-race condition in test_log_table.py::test_list_offsets by adding an explicit asyncio.sleep(0.1) delay, ensuring strict chronological ordering when resolving timestamp-based offsets.

Tests

Added comprehensive coverage in a new bindings/python/test/test_context_manager.py suite:

  • Happy Path Verification: test_append_writer_success_flush and test_upsert_writer_context_manager verify that data is automatically flushed and available to scanners/lookupers without explicit flush() calls.
  • Non-Blocking Exception Path: test_append_writer_exception_no_flush uses time.perf_counter() to assert that exiting a failed context block takes < 0.1s, successfully proving that the speed-of-light network RTT wait is bypassed without destroying the connection for subsequent tests.
  • Scanner Coverage: Explicit tests added for both create_log_scanner() and create_record_batch_log_scanner() resource lifecycle bounds, including exception propagation.

API and Format

  • API: Yes. This introduces new API surface to the Python bindings by adding the __aenter__ and __aexit__ magic methods, enabling async with syntax.
  • Format: No changes to the underlying storage format or the core Rust RPC protocols.

Documentation

Yes. This introduces a new feature for the Python client. Python type hints (__init__.pyi) and docstrings have been updated to reflect the new syntax and explicitly document the behavior of the exception fault-path.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Python async with support for Fluss client objects to improve lifecycle handling in async code, along with tests and example updates.

Changes:

  • Implemented __aenter__ / __aexit__ for FlussConnection, AppendWriter, UpsertWriter, and LogScanner (Rust → PyO3 bindings).
  • Added a new async context-manager-focused test suite and adjusted an existing offset test to reduce a timing race.
  • Updated Python type hints and the Python example to demonstrate async with usage.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
bindings/python/src/connection.rs Adds async context manager methods for FlussConnection.
bindings/python/src/table.rs Adds async context manager methods for AppendWriter and LogScanner (plus a close() stub).
bindings/python/src/upsert.rs Adds async context manager methods for UpsertWriter and a doc tweak.
bindings/python/fluss/init.pyi Updates Python typing surface to include async context manager methods.
bindings/python/test/test_context_manager.py New tests covering async with behavior for writers/scanners/connection.
bindings/python/test/test_log_table.py Adds a short sleep to reduce a timestamp-offset ordering race.
bindings/python/example/example.py Migrates the example to async with patterns (currently introduces correctness/syntax issues).
Comments suppressed due to low confidence (3)

bindings/python/example/example.py:112

  • append_writer is created inside an async with block but then used after the block has already exited. This defeats the purpose of the async context manager (auto-flush-on-exit) and will break once the writer/connection gains a real close() implementation. Please move the write/append logic inside the async with scope (or don’t use async with here).
    # Create a writer for the table
    async with table.new_append().create_writer() as append_writer:
        print(f"Created append writer: {append_writer}")

    try:
        # Demo: Write PyArrow Table
        print("\n--- Testing PyArrow Table write ---")

bindings/python/example/example.py:273

  • batch_scanner is created inside async with ... as batch_scanner: but subscribe_buckets() and subsequent reads are performed after the context has exited. This makes the example misleading and will break if/when scanners implement real close semantics. Keep all scanner usage inside the async with block (or avoid the context manager here).
    try:
        # Use new_scan().create_record_batch_log_scanner() for batch-based operations
        async with await table.new_scan().create_record_batch_log_scanner() as batch_scanner:
            print(f"Created batch scanner: {batch_scanner}")

        # Subscribe to buckets (required before to_arrow/to_pandas)
        # Use subscribe_buckets to subscribe all buckets from EARLIEST_OFFSET
        num_buckets = (await admin.get_table_info(table_path)).num_buckets
        batch_scanner.subscribe_buckets({i: fluss.EARLIEST_OFFSET for i in range(num_buckets)})
        print(f"Subscribed to {num_buckets} buckets from EARLIEST_OFFSET")

bindings/python/example/example.py:445

  • The async with pk_table.new_upsert().create_writer() as upsert_writer: block exits immediately after printing, but upsert_writer is then used for all subsequent upserts. This means the example does not actually run the writes inside the context manager (so no auto-flush-on-exit behavior is exercised) and it will break if the writer later implements real close semantics. Please move the upsert logic inside the async with block.
    try:
        async with pk_table.new_upsert().create_writer() as upsert_writer:
            print(f"Created upsert writer: {upsert_writer}")

        # Fire-and-forget: queue writes synchronously, flush at end.
        # Records are batched internally for efficiency.
        upsert_writer.upsert(
            {

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 42 to 46
# Create connection using the static create method
conn = await fluss.FlussConnection.create(config)

# Define fields for PyArrow
async with await fluss.FlussConnection.create(config) as conn:
# Define fields for PyArrow
fields = [
pa.field("id", pa.int32()),
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The async with await FlussConnection.create(...) as conn: block has no executable statement in its body (only a comment), and fields = [...] is dedented. This will raise a SyntaxError (“expected an indented block”) and also ends the connection context immediately. Please indent the subsequent setup logic under the async with (or add a real statement inside the block) so the connection remains open for the rest of main().

Copilot uses AI. Check for mistakes.
Comment on lines +113 to +127
// Exit the async runtime context (for 'async with' statement)
#[pyo3(signature = (_exc_type=None, _exc_value=None, _traceback=None))]
fn __aexit__<'py>(
&mut self,
py: Python<'py>,
_exc_type: Option<Bound<'py, PyAny>>,
_exc_value: Option<Bound<'py, PyAny>>,
_traceback: Option<Bound<'py, PyAny>>,
) -> PyResult<Bound<'py, PyAny>> {
future_into_py(py, async move {
// In the future, we could call an async close on the core connection here
// e.g., client.close().await;
Ok(false)
})
}
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__exit__ calls self.close()?, but the new async context manager __aexit__ does not. This means async with await FlussConnection.create(...) will not close the connection (even if close() is implemented later). Please mirror __exit__ by calling self.close()? in __aexit__ (it can be done before creating the future since close() is synchronous).

Copilot uses AI. Check for mistakes.
Comment on lines +998 to +1021
// Exit the async runtime context (for 'async with' statement)
/// On successful exit, the writer is automatically flushed.
/// If an exception occurs, the flush is skipped to allow immediate error
/// propagation, though pending records may still be sent in the background.
#[pyo3(signature = (exc_type=None, _exc_value=None, _traceback=None))]
fn __aexit__<'py>(
&self,
py: Python<'py>,
exc_type: Option<Bound<'py, PyAny>>,
_exc_value: Option<Bound<'py, PyAny>>,
_traceback: Option<Bound<'py, PyAny>>,
) -> PyResult<Bound<'py, PyAny>> {
let has_error = exc_type.is_some();
let inner = self.inner.clone();
future_into_py(py, async move {
if !has_error {
inner
.flush()
.await
.map_err(|e| FlussError::from_core_error(&e))?;
}
Ok(false)
})
}
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AppendWriter.__aexit__ only flushes on the success path and never closes/invalidates the writer. After leaving async with, the writer object remains fully usable and any underlying resources are not deterministically released, which doesn’t match the linked issue’s “flush then close / close on exception” contract. Consider adding an explicit close() (even if initially a no-op) and calling it from __aexit__, or otherwise marking the writer as closed so further writes fail fast.

Copilot uses AI. Check for mistakes.
Comment on lines +2366 to +2389
/// Close the scanner
pub fn close(&self) -> PyResult<()> {
Ok(())
}

// Enter the async runtime context (for 'async with' statement)
fn __aenter__<'py>(slf: PyRef<'py, Self>, py: Python<'py>) -> PyResult<Bound<'py, PyAny>> {
let py_slf = slf.into_pyobject(py)?.unbind();
future_into_py(py, async move { Ok(py_slf) })
}

// Exit the async runtime context (for 'async with' statement)
#[pyo3(signature = (_exc_type=None, _exc_value=None, _traceback=None))]
fn __aexit__<'py>(
&self,
py: Python<'py>,
_exc_type: Option<Bound<'py, PyAny>>,
_exc_value: Option<Bound<'py, PyAny>>,
_traceback: Option<Bound<'py, PyAny>>,
) -> PyResult<Bound<'py, PyAny>> {
future_into_py(py, async move {
// In the future, we can call an async close on the core scanner here
Ok(false)
})
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LogScanner.close() is currently a no-op, and __aexit__ doesn’t call it. This makes async with ... as scanner: misleading because it doesn’t actually perform any cleanup. If the core scanner doesn’t need cleanup, consider either documenting close() as a no-op or removing it; otherwise, implement actual cleanup and invoke it from __aexit__ (and potentially from __del__ for safety).

Copilot uses AI. Check for mistakes.
///
/// Returns:
/// None on success
/// Flush any pending data
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There’s an extra doc line /// Flush any pending data inserted after the Returns: section for flush(). It’s redundant with the existing docstring and makes the generated docs read oddly. Please remove this stray line or integrate it into the main flush doc comment above.

Suggested change
/// Flush any pending data

Copilot uses AI. Check for mistakes.
Comment on lines +75 to +88
start_time = time.perf_counter()
try:
async with table.new_append().create_writer() as writer:
writer.append({"a": 100})
raise TestException("abort")
except TestException:
pass
duration = time.perf_counter() - start_time

# Verification:
# 1. The exception was propagated immediately.
# 2. The block exited nearly instantly because it bypassed the network flush.
assert duration < 0.1, f"Context exit took too long ({duration:.3f}s), likely performed a flush"

Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test asserts the whole async with block completes in <0.1s. On slower/loaded CI machines this timing assertion can be flaky even when flush() is correctly skipped (context manager overhead + scheduling jitter). Consider using a more tolerant threshold, or assert behavior via mocking/observability (e.g., verifying flush() was not awaited / no records are guaranteed to be acknowledged) rather than wall-clock timing.

Copilot uses AI. Check for mistakes.
Comment on lines +112 to +122
class TestException(Exception): pass
start_time = time.perf_counter()
try:
async with table.new_upsert().create_writer() as writer:
writer.upsert({"id": 2, "v": "b"})
raise TestException("abort")
except TestException:
pass
duration = time.perf_counter() - start_time
assert duration < 0.1, f"Context exit took too long ({duration:.3f}s), likely performed a flush"

Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same concern as the append-writer timing test: asserting duration < 0.1 for the exception path is prone to CI flakiness and may fail due to scheduler jitter unrelated to flushing. Prefer a less brittle threshold or a behavioral assertion that doesn’t depend on wall-clock timing.

Copilot uses AI. Check for mistakes.
assert latest[0] == 0

before_append_ms = int(time.time() * 1000)
await asyncio.sleep(0.1)
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a fixed await asyncio.sleep(0.1) to separate before_append_ms from the subsequent writes adds an arbitrary delay and can still be fragile under clock granularity/skew. A more deterministic approach is to wait until the millisecond clock advances (e.g., loop until int(time.time()*1000) > before_append_ms) before appending, which avoids hard-coding a 100ms sleep.

Suggested change
await asyncio.sleep(0.1)
while int(time.time() * 1000) <= before_append_ms:
await asyncio.sleep(0)

Copilot uses AI. Check for mistakes.
print("\n--- Flushing data ---")
await append_writer.flush()
print("Successfully flushed data")
# Note: flush() and close() are automatically called by the 'async with' block on successful exit.
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment says flush() and close() are automatically called when leaving the async with block, but the current AppendWriter.__aexit__ implementation only flushes and there is no writer close() in the Python API. Please update the example text to match the actual behavior (or implement/introduce close() and call it from __aexit__).

Suggested change
# Note: flush() and close() are automatically called by the 'async with' block on successful exit.
# Note: flush() is automatically called by the 'async with' block on successful exit.

Copilot uses AI. Check for mistakes.
Comment on lines +491 to +492
# flush() and close() are automatically called by the 'async with' block on successful exit.
# Bypass manual flush:
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment claims the async with block will automatically call both flush() and close(), but UpsertWriter.__aexit__ currently only flushes and does not close/invalidate the writer. Please update the example comment to reflect reality (or add close semantics and invoke them from __aexit__).

Suggested change
# flush() and close() are automatically called by the 'async with' block on successful exit.
# Bypass manual flush:
# flush() is automatically called by the 'async with' block on successful exit.
# No manual flush is needed here:

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add async context manager support in python

2 participants