feat: (Python) Add async context manager by qzyu999 · Pull Request #487 · apache/fluss-rust

qzyu999 · 2026-04-10T04:20:15Z

Purpose

Linked issue: close #456

To implement asynchronous context managers (async with) for AppendWriter, UpsertWriter, and LogScanner in the Python bindings, ensuring proper resource lifecycle management and automated, non-blocking flushing of records.

Brief change log

Writer Context Managers (AppendWriter, UpsertWriter): Implemented __aenter__ and __aexit__ protocols.
- Happy Path: Automatically awaits flush() on normal exit, guaranteeing data delivery before releasing the context.
- Exception Path: Bypasses flush() to instantly free the Python asyncio event loop (fail-fast semantics). Note: Because the RecordAccumulator is a shared resource on the connection, this relies on a "best-effort" non-blocking design. It avoids calling close() or abort() to prevent permanently bricking the shared MemoryLimiter, meaning records appended prior to the exception may still be transmitted by the background Tokio thread.
Scanner Context Managers (LogScanner, RecordBatchLogScanner): Implemented __aenter__ and __aexit__ to establish the API contract for asynchronous resource reclamation.
Documentation & Type Hints: Updated __init__.pyi docstrings to clearly document the best-effort transactional semantics so developers understand the limits of client-side atomicity.
Test Stability: Fixed a micro-race condition in test_log_table.py::test_list_offsets by adding an explicit asyncio.sleep(0.1) delay, ensuring strict chronological ordering when resolving timestamp-based offsets.

Tests

Added comprehensive coverage in a new bindings/python/test/test_context_manager.py suite:

Happy Path Verification: test_append_writer_success_flush and test_upsert_writer_context_manager verify that data is automatically flushed and available to scanners/lookupers without explicit flush() calls.
Non-Blocking Exception Path: test_append_writer_exception_no_flush uses time.perf_counter() to assert that exiting a failed context block takes < 0.1s, successfully proving that the speed-of-light network RTT wait is bypassed without destroying the connection for subsequent tests.
Scanner Coverage: Explicit tests added for both create_log_scanner() and create_record_batch_log_scanner() resource lifecycle bounds, including exception propagation.

API and Format

API: Yes. This introduces new API surface to the Python bindings by adding the __aenter__ and __aexit__ magic methods, enabling async with syntax.
Format: No changes to the underlying storage format or the core Rust RPC protocols.

Documentation

Yes. This introduces a new feature for the Python client. Python type hints (__init__.pyi) and docstrings have been updated to reflect the new syntax and explicitly document the behavior of the exception fault-path.

…, and scanners in Python bindings

…nnections

Copilot

Pull request overview

Adds Python async with support for Fluss client objects to improve lifecycle handling in async code, along with tests and example updates.

Changes:

Implemented __aenter__ / __aexit__ for FlussConnection, AppendWriter, UpsertWriter, and LogScanner (Rust → PyO3 bindings).
Added a new async context-manager-focused test suite and adjusted an existing offset test to reduce a timing race.
Updated Python type hints and the Python example to demonstrate async with usage.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 11 comments.

Show a summary per file

File	Description
bindings/python/src/connection.rs	Adds async context manager methods for `FlussConnection`.
bindings/python/src/table.rs	Adds async context manager methods for `AppendWriter` and `LogScanner` (plus a `close()` stub).
bindings/python/src/upsert.rs	Adds async context manager methods for `UpsertWriter` and a doc tweak.
bindings/python/fluss/init.pyi	Updates Python typing surface to include async context manager methods.
bindings/python/test/test_context_manager.py	New tests covering `async with` behavior for writers/scanners/connection.
bindings/python/test/test_log_table.py	Adds a short sleep to reduce a timestamp-offset ordering race.
bindings/python/example/example.py	Migrates the example to `async with` patterns (currently introduces correctness/syntax issues).

Comments suppressed due to low confidence (3)

bindings/python/example/example.py:112

append_writer is created inside an async with block but then used after the block has already exited. This defeats the purpose of the async context manager (auto-flush-on-exit) and will break once the writer/connection gains a real close() implementation. Please move the write/append logic inside the async with scope (or don’t use async with here).

    # Create a writer for the table
    async with table.new_append().create_writer() as append_writer:
        print(f"Created append writer: {append_writer}")

    try:
        # Demo: Write PyArrow Table
        print("\n--- Testing PyArrow Table write ---")

bindings/python/example/example.py:273

batch_scanner is created inside async with ... as batch_scanner: but subscribe_buckets() and subsequent reads are performed after the context has exited. This makes the example misleading and will break if/when scanners implement real close semantics. Keep all scanner usage inside the async with block (or avoid the context manager here).

    try:
        # Use new_scan().create_record_batch_log_scanner() for batch-based operations
        async with await table.new_scan().create_record_batch_log_scanner() as batch_scanner:
            print(f"Created batch scanner: {batch_scanner}")

        # Subscribe to buckets (required before to_arrow/to_pandas)
        # Use subscribe_buckets to subscribe all buckets from EARLIEST_OFFSET
        num_buckets = (await admin.get_table_info(table_path)).num_buckets
        batch_scanner.subscribe_buckets({i: fluss.EARLIEST_OFFSET for i in range(num_buckets)})
        print(f"Subscribed to {num_buckets} buckets from EARLIEST_OFFSET")

bindings/python/example/example.py:445

The async with pk_table.new_upsert().create_writer() as upsert_writer: block exits immediately after printing, but upsert_writer is then used for all subsequent upserts. This means the example does not actually run the writes inside the context manager (so no auto-flush-on-exit behavior is exercised) and it will break if the writer later implements real close semantics. Please move the upsert logic inside the async with block.

    try:
        async with pk_table.new_upsert().create_writer() as upsert_writer:
            print(f"Created upsert writer: {upsert_writer}")

        # Fire-and-forget: queue writes synchronously, flush at end.
        # Records are batched internally for efficiency.
        upsert_writer.upsert(
            {

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-11T01:54:17Z

bindings/python/example/example.py

    # Create connection using the static create method
-    conn = await fluss.FlussConnection.create(config)
-
-    # Define fields for PyArrow
+    async with await fluss.FlussConnection.create(config) as conn:
+        # Define fields for PyArrow
    fields = [
        pa.field("id", pa.int32()),


The async with await FlussConnection.create(...) as conn: block has no executable statement in its body (only a comment), and fields = [...] is dedented. This will raise a SyntaxError (“expected an indented block”) and also ends the connection context immediately. Please indent the subsequent setup logic under the async with (or add a real statement inside the block) so the connection remains open for the rest of main().

Copilot · 2026-04-11T01:54:17Z

bindings/python/src/connection.rs

+    // Exit the async runtime context (for 'async with' statement)
+    #[pyo3(signature = (_exc_type=None, _exc_value=None, _traceback=None))]
+    fn __aexit__<'py>(
+        &mut self,
+        py: Python<'py>,
+        _exc_type: Option<Bound<'py, PyAny>>,
+        _exc_value: Option<Bound<'py, PyAny>>,
+        _traceback: Option<Bound<'py, PyAny>>,
+    ) -> PyResult<Bound<'py, PyAny>> {
+        future_into_py(py, async move {
+            // In the future, we could call an async close on the core connection here
+            // e.g., client.close().await;
+            Ok(false)
+        })
+    }


__exit__ calls self.close()?, but the new async context manager __aexit__ does not. This means async with await FlussConnection.create(...) will not close the connection (even if close() is implemented later). Please mirror __exit__ by calling self.close()? in __aexit__ (it can be done before creating the future since close() is synchronous).

Copilot · 2026-04-11T01:54:17Z

bindings/python/src/table.rs

+    // Exit the async runtime context (for 'async with' statement)
+    /// On successful exit, the writer is automatically flushed.
+    /// If an exception occurs, the flush is skipped to allow immediate error
+    /// propagation, though pending records may still be sent in the background.
+    #[pyo3(signature = (exc_type=None, _exc_value=None, _traceback=None))]
+    fn __aexit__<'py>(
+        &self,
+        py: Python<'py>,
+        exc_type: Option<Bound<'py, PyAny>>,
+        _exc_value: Option<Bound<'py, PyAny>>,
+        _traceback: Option<Bound<'py, PyAny>>,
+    ) -> PyResult<Bound<'py, PyAny>> {
+        let has_error = exc_type.is_some();
+        let inner = self.inner.clone();
+        future_into_py(py, async move {
+            if !has_error {
+                inner
+                    .flush()
+                    .await
+                    .map_err(|e| FlussError::from_core_error(&e))?;
+            }
+            Ok(false)
+        })
+    }


AppendWriter.__aexit__ only flushes on the success path and never closes/invalidates the writer. After leaving async with, the writer object remains fully usable and any underlying resources are not deterministically released, which doesn’t match the linked issue’s “flush then close / close on exception” contract. Consider adding an explicit close() (even if initially a no-op) and calling it from __aexit__, or otherwise marking the writer as closed so further writes fail fast.

Copilot · 2026-04-11T01:54:18Z

bindings/python/src/table.rs

+    /// Close the scanner
+    pub fn close(&self) -> PyResult<()> {
+        Ok(())
+    }
+
+    // Enter the async runtime context (for 'async with' statement)
+    fn __aenter__<'py>(slf: PyRef<'py, Self>, py: Python<'py>) -> PyResult<Bound<'py, PyAny>> {
+        let py_slf = slf.into_pyobject(py)?.unbind();
+        future_into_py(py, async move { Ok(py_slf) })
+    }
+
+    // Exit the async runtime context (for 'async with' statement)
+    #[pyo3(signature = (_exc_type=None, _exc_value=None, _traceback=None))]
+    fn __aexit__<'py>(
+        &self,
+        py: Python<'py>,
+        _exc_type: Option<Bound<'py, PyAny>>,
+        _exc_value: Option<Bound<'py, PyAny>>,
+        _traceback: Option<Bound<'py, PyAny>>,
+    ) -> PyResult<Bound<'py, PyAny>> {
+        future_into_py(py, async move {
+            // In the future, we can call an async close on the core scanner here
+            Ok(false)
+        })


LogScanner.close() is currently a no-op, and __aexit__ doesn’t call it. This makes async with ... as scanner: misleading because it doesn’t actually perform any cleanup. If the core scanner doesn’t need cleanup, consider either documenting close() as a no-op or removing it; otherwise, implement actual cleanup and invoke it from __aexit__ (and potentially from __del__ for safety).

Copilot · 2026-04-11T01:54:18Z

bindings/python/src/upsert.rs

    ///
    /// Returns:
    ///     None on success
+    /// Flush any pending data


There’s an extra doc line /// Flush any pending data inserted after the Returns: section for flush(). It’s redundant with the existing docstring and makes the generated docs read oddly. Please remove this stray line or integrate it into the main flush doc comment above.

Suggested change

/// Flush any pending data

Copilot · 2026-04-11T01:54:18Z

bindings/python/test/test_context_manager.py

+    start_time = time.perf_counter()
+    try:
+        async with table.new_append().create_writer() as writer:
+            writer.append({"a": 100})
+            raise TestException("abort")
+    except TestException:
+        pass
+    duration = time.perf_counter() - start_time
+
+    # Verification:
+    # 1. The exception was propagated immediately.
+    # 2. The block exited nearly instantly because it bypassed the network flush.
+    assert duration < 0.1, f"Context exit took too long ({duration:.3f}s), likely performed a flush"
+


The test asserts the whole async with block completes in <0.1s. On slower/loaded CI machines this timing assertion can be flaky even when flush() is correctly skipped (context manager overhead + scheduling jitter). Consider using a more tolerant threshold, or assert behavior via mocking/observability (e.g., verifying flush() was not awaited / no records are guaranteed to be acknowledged) rather than wall-clock timing.

Copilot · 2026-04-11T01:54:18Z

bindings/python/test/test_context_manager.py

+    class TestException(Exception): pass
+    start_time = time.perf_counter()
+    try:
+        async with table.new_upsert().create_writer() as writer:
+            writer.upsert({"id": 2, "v": "b"})
+            raise TestException("abort")
+    except TestException:
+        pass
+    duration = time.perf_counter() - start_time
+    assert duration < 0.1, f"Context exit took too long ({duration:.3f}s), likely performed a flush"
+


Same concern as the append-writer timing test: asserting duration < 0.1 for the exception path is prone to CI flakiness and may fail due to scheduler jitter unrelated to flushing. Prefer a less brittle threshold or a behavioral assertion that doesn’t depend on wall-clock timing.

Copilot · 2026-04-11T01:54:19Z

bindings/python/test/test_log_table.py

    assert latest[0] == 0

    before_append_ms = int(time.time() * 1000)
+    await asyncio.sleep(0.1)


Using a fixed await asyncio.sleep(0.1) to separate before_append_ms from the subsequent writes adds an arbitrary delay and can still be fragile under clock granularity/skew. A more deterministic approach is to wait until the millisecond clock advances (e.g., loop until int(time.time()*1000) > before_append_ms) before appending, which avoids hard-coding a 100ms sleep.

Suggested change

await asyncio.sleep(0.1)

while int(time.time() * 1000) <= before_append_ms:

await asyncio.sleep(0)

Copilot · 2026-04-11T01:54:19Z

bindings/python/example/example.py

-        print("\n--- Flushing data ---")
-        await append_writer.flush()
-        print("Successfully flushed data")
+        # Note: flush() and close() are automatically called by the 'async with' block on successful exit.


This comment says flush() and close() are automatically called when leaving the async with block, but the current AppendWriter.__aexit__ implementation only flushes and there is no writer close() in the Python API. Please update the example text to match the actual behavior (or implement/introduce close() and call it from __aexit__).

Suggested change

# Note: flush() and close() are automatically called by the 'async with' block on successful exit.

# Note: flush() is automatically called by the 'async with' block on successful exit.

Copilot · 2026-04-11T01:54:19Z

bindings/python/example/example.py

+        # flush() and close() are automatically called by the 'async with' block on successful exit.
+        # Bypass manual flush:


This comment claims the async with block will automatically call both flush() and close(), but UpsertWriter.__aexit__ currently only flushes and does not close/invalidate the writer. Please update the example comment to reflect reality (or add close semantics and invoke them from __aexit__).

Suggested change

# flush() and close() are automatically called by the 'async with' block on successful exit.

# Bypass manual flush:

# flush() is automatically called by the 'async with' block on successful exit.

# No manual flush is needed here:

qzyu999 added 2 commits April 9, 2026 20:28

feat: implement async context manager support for connection, writers…

2a29c63

…, and scanners in Python bindings

feat: expand python context manager test coverage for scanners and co…

dec0921

…nnections

luoyuxia requested a review from Copilot April 11, 2026 01:49

Copilot started reviewing on behalf of luoyuxia April 11, 2026 01:49 View session

Copilot AI reviewed Apr 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: (Python) Add async context manager#487

feat: (Python) Add async context manager#487
qzyu999 wants to merge 2 commits intoapache:mainfrom
qzyu999:feat/456-python-async-context-manager

qzyu999 commented Apr 10, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 11, 2026

Uh oh!

Copilot AI Apr 11, 2026

Uh oh!

Copilot AI Apr 11, 2026

Uh oh!

Copilot AI Apr 11, 2026

Uh oh!

Copilot AI Apr 11, 2026

Uh oh!

Copilot AI Apr 11, 2026

Uh oh!

Copilot AI Apr 11, 2026

Uh oh!

Copilot AI Apr 11, 2026

Uh oh!

Copilot AI Apr 11, 2026

Uh oh!

Copilot AI Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	await asyncio.sleep(0.1)
	while int(time.time() * 1000) <= before_append_ms:
	await asyncio.sleep(0)

	# Note: flush() and close() are automatically called by the 'async with' block on successful exit.
	# Note: flush() is automatically called by the 'async with' block on successful exit.

		# flush() and close() are automatically called by the 'async with' block on successful exit.
		# Bypass manual flush:

Conversation

qzyu999 commented Apr 10, 2026

Purpose

Brief change log

Tests

API and Format

Documentation

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants