Add ORFS Integration test by luarss · Pull Request #37 · luarss/openroad-mcp

luarss · 2026-02-23T18:50:46Z

Summary by CodeRabbit

Tests
- Added comprehensive integration tests for MCP-driven ORFS and session lifecycles, plus updated interactive tests and fixtures to improve coverage and robustness.
Documentation
- Roadmap updated: marked the integration-testing checklist item complete and refreshed last-updated timestamp.
Bug Fixes / Improvements
- Improved session startup/teardown behavior (draining output and safer termination) and relaxed performance/test timing expectations to accommodate the startup delay.

- Add comprehensive test suite for MCP tool discovery, session lifecycle, OpenROAD sessions, and ORFS report image tools - Fix conftest.py to disable command validation for bash sessions - Use runtime-evaluated ORFS_FLOW_PATH via get_orfs_flow_path() function to properly respect environment variable configuration - Fix test_session_metrics_aggregation to access nested response structure Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai · 2026-02-23T18:51:07Z

📝 Walkthrough

Walkthrough

Marks ORFS integration tests complete in ROADMAP, adds extensive MCP/ORFS integration tests and fixtures, updates interactive test scaffolding and performance thresholds, and changes OpenROADManager initialization and session startup to wait and drain output before marking sessions active.

Changes

Cohort / File(s)	Summary
Roadmap `ROADMAP.md`	Marked "Integration tests with real ORFS flows" complete; updated "Last Updated" to 2026-02-23.
Integration test fixtures & clients `tests/integration/conftest.py`	Replaced `event_loop` fixture with internal asynccontextmanager `_mcp_session`; added function-scoped `mcp_client` fixture that builds `StdioServerParameters` with a merged `env` including `OPENROAD_ENABLE_COMMAND_VALIDATION="false"`; refined teardown and error suppression.
Integration tests (new) `tests/integration/test_orfs_flows.py`	Added comprehensive MCP/ORFS integration tests covering tool discovery, session lifecycle (create/execute/inspect/history/metrics/terminate), guarded OpenROAD sessions, synthetic ORFS runs, and optional filesystem-backed ORFS integration.
Interactive test scaffolding `tests/interactive/conftest.py`, `tests/interactive/test_session_manager.py`	Reworked autouse fixture to `reset_manager_singleton` using `pytest_asyncio`; reset singleton and run cleanup_all; updated tests to use `MagicMock`/`AsyncMock`, enum-based session state checks, and switched cleanup calls to `cleanup_all()`.
Performance test `tests/performance/test_benchmarks.py`	Relaxed session creation latency assertions to account for startup drain (~150ms): per-sample, average, and max thresholds increased.
Runtime manager behavior `src/openroad_mcp/core/manager.py`	Constructor signatures accept optional kwargs/max_sessions; init logs configured max; create_session now waits COMMAND_COMPLETION_DELAY * 1.5 and drains the session output buffer after starting a session and ensures subprocess termination on create failures.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant Manager as OpenROADManager
    participant Session as InteractiveSession
    participant Buffer as OutputBuffer

    Client->>Manager: create_session(request)
    Manager->>Session: start subprocess / PTY
    Session-->>Manager: started
    Manager->>Manager: sleep(COMMAND_COMPLETION_DELAY * 1.5)
    Manager->>Buffer: drain(session.output_buffer)
    Buffer-->>Manager: drained
    Manager->>Client: session_ready(response)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Cleanup: remove unused subprocess pipe from manager #18 — Overlaps modifications to OpenROADManager constructor and session lifecycle behavior.
Interactive PTY session #7 — Related changes to interactive PTY/session subsystem, CircularBuffer/InteractiveSession, and manager integration.

Poem

🐇 I hopped through tests and danced with shells,

sessions warmed, then outputs cleared their bells.
Validation nap, drains tidy and bright,
ORFS images counted by soft moonlight.
A rabbit cheers: hoppity, delight!

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title 'Add ORFS Integration test' is accurate but incomplete. While the PR does add comprehensive ORFS integration tests (test_orfs_flows.py), it also includes significant fixture improvements (conftest.py refactoring), manager initialization changes, test infrastructure updates, and performance baseline adjustments across 6 modified files.
Docstring Coverage	✅ Passed	Docstring coverage is 91.67% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch integration-1

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

tests/integration/test_orfs_flows.py (1)

475-477: Hardcoded image count 4 couples the assertion to the fixture definition.

If stage_images in synthetic_orfs_run ever gains or loses a final_* entry, this assertion fails without an obvious error message.

♻️ Proposed fix

-        assert data["total_images"] == 4  # final_all, final_routing, final_congestion, final_placement
+        expected_final_count = sum(1 for f in synthetic_orfs_run["images"] if f.startswith("final_"))
+        assert data["total_images"] == expected_final_count

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/integration/test_orfs_flows.py` around lines 475 - 477, Replace the
hardcoded total_images assertion with a dynamic calculation derived from
data["images_by_stage"]: compute expected_total as the sum of lengths of all
image lists whose stage key starts with "final" (or simply sum all lists if that
better matches fixture semantics), then assert data["total_images"] ==
expected_total and include a helpful message; reference data["images_by_stage"]
and the synthetic_orfs_run fixture's stage_images when making this change.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/integration/test_orfs_flows.py`:
- Around line 391-409: The subprocess env is being overwritten: instead of
passing a minimal dict to StdioServerParameters, create the env by copying
os.environ and updating the two keys ("ORFS_FLOW_PATH" and
"OPENROAD_ENABLE_COMMAND_VALIDATION") so PATH/HOME etc. are preserved; also
remove the duplicated cancel-scope RuntimeError handling in this fixture and
instead call a shared async helper (e.g., _mcp_session) that encapsulates the
try/except and the async with stdio_client(...) / ClientSession(...) /
session.initialize() / asyncio.sleep(1.0) flow so both this test fixture and the
other usage reuse the same cancel-scope guard.
- Around line 38-46: The docstring for get_orfs_flow_path is incorrect:
skip_if_no_orfs calls get_orfs_flow_path() at import time (via
pytest.mark.skipif) so the existence check happens during module import, not
runtime; either change the docstring to say "evaluated at import/collection
time" or make the skip lazy by removing skip_if_no_orfs and performing the
existence check at test runtime (e.g., call get_orfs_flow_path().exists() inside
a fixture or call pytest.skip() in setup), updating references to
skip_if_no_orfs and usages accordingly; touch get_orfs_flow_path,
skip_if_no_orfs, and any tests that rely on the current behavior.
- Around line 531-543: The two tests test_orfs_platforms_discoverable and
test_orfs_gcd_design_exists declare the mcp_client fixture but never use it;
remove the unused fixture parameter from both function signatures so they become
plain async def test_orfs_platforms_discoverable() and async def
test_orfs_gcd_design_exists(), verify no other references to mcp_client exist in
those functions, run the test suite to ensure nothing else depends on the
fixture being present, and if any linter/test warnings remain, remove or adjust
any unused-imports or fixture mentions related to these two tests.

---

Nitpick comments:
In `@tests/integration/test_orfs_flows.py`:
- Around line 475-477: Replace the hardcoded total_images assertion with a
dynamic calculation derived from data["images_by_stage"]: compute expected_total
as the sum of lengths of all image lists whose stage key starts with "final" (or
simply sum all lists if that better matches fixture semantics), then assert
data["total_images"] == expected_total and include a helpful message; reference
data["images_by_stage"] and the synthetic_orfs_run fixture's stage_images when
making this change.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d5a4e36 and 08e0507.

📒 Files selected for processing (3)

ROADMAP.md
tests/integration/conftest.py
tests/integration/test_orfs_flows.py

codecov · 2026-02-23T19:56:43Z

Codecov Report

❌ Patch coverage is 54.54545% with 5 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
src/openroad_mcp/core/manager.py	54.54%	5 Missing ⚠️

📢 Thoughts on this report? Let us know!

Signed-off-by: Jack Luar <39641663+luarss@users.noreply.github.com>

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/openroad_mcp/core/manager.py`:
- Around line 73-74: The warm-up awaits (await asyncio.sleep(...) and await
session.output_buffer.drain_all()) run after session.start() has already spawned
the subprocess, so if they raise the current rollback only deletes
_sessions[session_id] and leaves the child process running; wrap the warm-up
steps in a try/except/finally so that on any exception you explicitly stop the
started process (call session.terminate() or, if no high-level API exists,
session.process.kill() and await session.process.wait()), then remove the
session from the _sessions dict (the same deletion logic now at Lines 81-85) and
re-raise the exception; ensure you reference session.start(),
session.output_buffer.drain_all(), and the _sessions[session_id] removal in your
changes.

In `@tests/integration/conftest.py`:
- Around line 13-28: The suppression of RuntimeError currently wraps the entire
_mcp_session, so move the try/except to only cover post-yield cleanup: inside
_mcp_session open stdio_client and ClientSession, await session.initialize(),
then yield session (no outer try/except). After the yield, perform the cleanup
inside a try/except that catches RuntimeError and only suppresses it when
"cancel scope" appears in the error string; re-raise other errors. Reference
symbols: _mcp_session, stdio_client, ClientSession, session.initialize, and the
yield session to locate where to relocate the exception handling.

In `@tests/integration/test_orfs_flows.py`:
- Around line 320-331: The tests calling mcp_client.call_tool and parsing via
_parse_tool_result (in test_openroad_version_command and the related timing
test) currently only assert non-empty output and echoed session_id, which allows
error payloads to pass; modify the assertions to require data.get("error") is
None and also assert presence of at least one expected marker in the output
(e.g., a version pattern like "2." or "OpenROAD" for
test_openroad_version_command, and the exact session_id plus no error for
test_openroad_timing_commands) so failures in command execution fail the tests;
update assertions that reference exec_result, data, output, and session_id
accordingly.
- Line 22: Replace the runtime generation of a 1×1 WEBP image in
tests/integration/test_orfs_flows.py (the fixture at lines ~414–417 that
currently uses PIL.Image to create and save a WEBP) with a checked-in fixture:
either load a committed .webp file via
Image.open("tests/fixtures/one_pixel.webp") or embed the WEBP bytes in the test
(opened with io.BytesIO) so the test no longer requires the Pillow WEBP codec at
runtime; update any references to the generated image variable to use the opened
fixture instead.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2e95f687-8765-4f48-a73a-ef6ec46f978f

📥 Commits

Reviewing files that changed from the base of the PR and between 08e0507 and c0b27f9.

📒 Files selected for processing (3)

src/openroad_mcp/core/manager.py
tests/integration/conftest.py
tests/integration/test_orfs_flows.py

Signed-off-by: Jack Luar <39641663+luarss@users.noreply.github.com>

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

tests/interactive/test_session_manager.py (1)

79-86: Consider extracting repeated PTY mock setup to a shared fixture.

The same mock configuration pattern appears in 13 test methods:

mock_pty = AsyncMock()
mock_pty.is_process_alive = MagicMock(return_value=True)
mock_pty.wait_for_exit.return_value = None
mock_pty.read_output.return_value = None
mock_pty_class.return_value = mock_pty

This could be extracted to a fixture in conftest.py to reduce duplication and simplify future maintenance.

♻️ Example fixture extraction

Add to conftest.py:

`@pytest.fixture`
def mock_pty_handler():
    """Create a pre-configured PTY mock."""
    with patch("openroad_mcp.interactive.session.PTYHandler") as mock_pty_class:
        mock_pty = AsyncMock()
        mock_pty.is_process_alive = MagicMock(return_value=True)
        mock_pty.wait_for_exit.return_value = None
        mock_pty.read_output.return_value = None
        mock_pty_class.return_value = mock_pty
        yield mock_pty

Then tests simplify to:

async def test_create_session_with_params(self, mock_pty_handler, session_manager, tmp_path):
    session_id = await session_manager.create_session(...)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/interactive/test_session_manager.py` around lines 79 - 86, Extract the
repeated PTY mock setup into a pytest fixture (e.g., mock_pty_handler) that
patches PTYHandler and returns the pre-configured AsyncMock (mock_pty) with
is_process_alive, wait_for_exit, and read_output stubbed, then update tests that
currently create mock_pty/mock_pty_class to accept that fixture and remove the
duplicated setup; ensure the fixture uses patch to set PTYHandler.return_value
to the AsyncMock so existing test code that relies on PTYHandler continues to
work and all references to mock_pty, mock_pty_class, and PTYHandler are
preserved.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/interactive/conftest.py`:
- Around line 8-16: The fixture reset_manager_singleton currently nukes
OpenROADManager._instance directly which leaks background tasks and resources;
convert it into an async pytest fixture (async def reset_manager_singleton) and
instead call await OpenROADManager.cleanup_all() to properly stop _reader_task,
_writer_task, _exit_monitor_task and close sessions before nulling _instance;
after yield call await OpenROADManager.cleanup_all() again (or check instance
exists) then set OpenROADManager._instance = None and run gc.collect() to ensure
asyncio.Lock and PTY/session resources are released; reference the
OpenROADManager class and its cleanup_all() method when making the change.

---

Nitpick comments:
In `@tests/interactive/test_session_manager.py`:
- Around line 79-86: Extract the repeated PTY mock setup into a pytest fixture
(e.g., mock_pty_handler) that patches PTYHandler and returns the pre-configured
AsyncMock (mock_pty) with is_process_alive, wait_for_exit, and read_output
stubbed, then update tests that currently create mock_pty/mock_pty_class to
accept that fixture and remove the duplicated setup; ensure the fixture uses
patch to set PTYHandler.return_value to the AsyncMock so existing test code that
relies on PTYHandler continues to work and all references to mock_pty,
mock_pty_class, and PTYHandler are preserved.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: eb76efd8-2f95-4633-b8fb-0f35ed501248

📥 Commits

Reviewing files that changed from the base of the PR and between c0b27f9 and 71441ea.

📒 Files selected for processing (4)

src/openroad_mcp/core/manager.py
tests/interactive/conftest.py
tests/interactive/test_session_manager.py
tests/performance/test_benchmarks.py

🚧 Files skipped from review as they are similar to previous changes (1)

src/openroad_mcp/core/manager.py

- manager.py: terminate orphaned subprocess if warmup (sleep/drain_all) raises after session.start() succeeds - conftest.py (integration): narrow cancel-scope suppression to teardown only using _in_teardown flag, so test-body RuntimeErrors propagate - test_orfs_flows.py: assert error is None in OpenROAD command tests to catch failure payloads; replace PIL WEBP generation with hardcoded lossless bytes to remove codec dependency; drop unused io/PIL imports - conftest.py (interactive): convert reset_manager_singleton to async fixture and call cleanup_all() before resetting singleton to avoid leaking background tasks and PTY handles Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

coderabbitai

🧹 Nitpick comments (2)

tests/integration/test_orfs_flows.py (1)
414-419: Consider using ASCII x instead of × in the comment.

The comment "Minimal valid 1×1 black WebP" uses a Unicode multiplication sign (×) which triggered a static analysis warning. While this doesn't affect functionality, using the ASCII x would be more consistent.
♻️ Proposed fix
-        # Minimal valid 1×1 black WebP (lossless, codec-independent)
+        # Minimal valid 1x1 black WebP (lossless, codec-independent)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration/test_orfs_flows.py` around lines 414 - 419, Update the
comment above the minimal_webp byte literal to replace the Unicode
multiplication sign "×" with the ASCII "x" so the comment reads "Minimal valid
1x1 black WebP"; locate the comment immediately preceding the minimal_webp
variable in tests/integration/test_orfs_flows.py and make the single-character
substitution to avoid the static analysis warning.
src/openroad_mcp/core/manager.py (1)
24-27: The **_kwargs parameter is unused and appears to be dead code.

The relevant snippet from src/openroad_mcp/server.py:28-29 shows the only instantiation is OpenROADManager() without any keyword arguments. This parameter will never receive any values.

If this was added for future extensibility or testing purposes, consider documenting that intent. Otherwise, remove it to avoid confusion.
♻️ Proposed fix if no future use is planned
-    def __new__(cls, **_kwargs: object) -> "OpenROADManager":
+    def __new__(cls) -> "OpenROADManager":
         if cls._instance is None:
             cls._instance = super().__new__(cls)
         return cls._instance
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openroad_mcp/core/manager.py` around lines 24 - 27, The __new__ method on
OpenROADManager currently declares an unused variadic keyword parameter
**_kwargs; remove this dead parameter by changing the signature of
OpenROADManager.__new__ to accept only (cls) and keep the singleton logic
unchanged; if the parameter was intentionally reserved for future extension or
tests, instead add a short docstring or comment on the __new__ method explaining
that intent rather than keeping an unused parameter.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/openroad_mcp/core/manager.py`:
- Around line 24-27: The __new__ method on OpenROADManager currently declares an
unused variadic keyword parameter **_kwargs; remove this dead parameter by
changing the signature of OpenROADManager.__new__ to accept only (cls) and keep
the singleton logic unchanged; if the parameter was intentionally reserved for
future extension or tests, instead add a short docstring or comment on the
__new__ method explaining that intent rather than keeping an unused parameter.

In `@tests/integration/test_orfs_flows.py`:
- Around line 414-419: Update the comment above the minimal_webp byte literal to
replace the Unicode multiplication sign "×" with the ASCII "x" so the comment
reads "Minimal valid 1x1 black WebP"; locate the comment immediately preceding
the minimal_webp variable in tests/integration/test_orfs_flows.py and make the
single-character substitution to avoid the static analysis warning.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c0a11966-e4f2-425f-874d-cb5ce719942c

📥 Commits

Reviewing files that changed from the base of the PR and between 71441ea and 2c5b2ed.

📒 Files selected for processing (4)

src/openroad_mcp/core/manager.py
tests/integration/conftest.py
tests/integration/test_orfs_flows.py
tests/interactive/conftest.py

luarss and others added 2 commits February 23, 2026 18:48

Mark integration tests as complete in roadmap

08e0507

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai bot reviewed Feb 23, 2026

View reviewed changes

Comment thread tests/integration/test_orfs_flows.py

Comment thread tests/integration/test_orfs_flows.py Outdated

Comment thread tests/integration/test_orfs_flows.py Outdated

luarss added 2 commits March 10, 2026 02:57

refactor client session into fixtures

948ecf4

Signed-off-by: Jack Luar <39641663+luarss@users.noreply.github.com>

allow startup output to drain

c0b27f9

Signed-off-by: Jack Luar <39641663+luarss@users.noreply.github.com>

coderabbitai bot reviewed Mar 10, 2026

View reviewed changes

Comment thread src/openroad_mcp/core/manager.py

Comment thread tests/integration/conftest.py

Comment thread tests/integration/test_orfs_flows.py Outdated

Comment thread tests/integration/test_orfs_flows.py

luarss added 2 commits March 10, 2026 03:48

unskip tests

96c3a40

Signed-off-by: Jack Luar <39641663+luarss@users.noreply.github.com>

increase test-performance threshold

71441ea

Signed-off-by: Jack Luar <39641663+luarss@users.noreply.github.com>

coderabbitai bot reviewed Mar 10, 2026

View reviewed changes

Comment thread tests/interactive/conftest.py Outdated

coderabbitai bot reviewed Mar 10, 2026

View reviewed changes

Conversation

luarss commented Feb 23, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

luarss commented Feb 23, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 23, 2026 •

edited

Loading

codecov bot commented Feb 23, 2026 •

edited

Loading