Skip to content

Enable ACPAgent on RemoteRuntime API via ACP conversation endpoints#2465

Merged
simonrosenberg merged 8 commits intomainfrom
feat/acp-remote-runtime-acp-endpoints
Mar 16, 2026
Merged

Enable ACPAgent on RemoteRuntime API via ACP conversation endpoints#2465
simonrosenberg merged 8 commits intomainfrom
feat/acp-remote-runtime-acp-endpoints

Conversation

@simonrosenberg
Copy link
Collaborator

@simonrosenberg simonrosenberg commented Mar 16, 2026

Summary

  • replay the ACP remote-runtime support from Enable ACPAgent on RemoteRuntime API #2190 onto current main
  • keep the existing /api/conversations REST contract stable and Agent-only
  • add ACP-capable conversation endpoints only for the four affected schema-sensitive operations under /api/acp/conversations
  • route RemoteConversation to the ACP endpoints only for ACP conversation creation and conversation-info reads

Why this approach

#2190 was useful, but it widened the public v1 REST contract by registering ACPAgent at the AgentBase boundary. That changed the OpenAPI schema and caused older clients that posted a plain agent object without kind to start failing with 422.

This PR keeps the legacy contract intact and limits the new polymorphic contract to the endpoints that actually need it:

  • POST /api/acp/conversations
  • GET /api/acp/conversations/{conversation_id}
  • GET /api/acp/conversations
  • GET /api/acp/conversations/search

Everything else stays on the legacy /api/conversations surface.

Included from #2190

  • ACPAgent remote-runtime support and lifecycle cleanup
  • ACP Docker image provisioning and examples
  • ACP eval workflow plumbing
  • ACP runtime fixes and tests from the original feature branch

Compatibility notes

  • /api/conversations remains backward-compatible and keeps the old Agent request/response shape
  • ACP-capable request/response schemas are isolated to /api/acp/conversations
  • RemoteConversation uses ACP endpoints only where the contract differs; events, run, pause, secrets, and related actions stay on the existing routes
  • conversation webhooks keep the legacy ConversationInfo payload for legacy Agent conversations and use the ACP shape only for ACP conversations

Test plan

  • uv run pytest tests/agent_server/test_conversation_router.py tests/agent_server/test_conversation_router_acp.py tests/agent_server/test_openapi_discriminator.py tests/sdk/conversation/remote/test_remote_conversation.py tests/sdk/agent/test_acp_agent.py
  • uv run pytest tests/sdk/agent/test_acp_agent.py tests/agent_server/test_conversation_service.py
  • uv run pre-commit run --files openhands-agent-server/openhands/agent_server/api.py openhands-agent-server/openhands/agent_server/conversation_router_acp.py openhands-agent-server/openhands/agent_server/conversation_service.py openhands-agent-server/openhands/agent_server/models.py openhands-sdk/openhands/sdk/agent/acp_agent.py openhands-sdk/openhands/sdk/conversation/impl/remote_conversation.py tests/agent_server/test_conversation_router.py tests/agent_server/test_conversation_router_acp.py tests/agent_server/test_conversation_service.py tests/agent_server/test_openapi_discriminator.py tests/sdk/agent/test_acp_agent.py tests/sdk/conversation/remote/test_remote_conversation.py

Supersedes the accidentally closed #2461.


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.13-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:37a17aa-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-37a17aa-python \
  ghcr.io/openhands/agent-server:37a17aa-python

All tags pushed for this build

ghcr.io/openhands/agent-server:37a17aa-golang-amd64
ghcr.io/openhands/agent-server:37a17aa-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:37a17aa-golang-arm64
ghcr.io/openhands/agent-server:37a17aa-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:37a17aa-java-amd64
ghcr.io/openhands/agent-server:37a17aa-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:37a17aa-java-arm64
ghcr.io/openhands/agent-server:37a17aa-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:37a17aa-python-amd64
ghcr.io/openhands/agent-server:37a17aa-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-amd64
ghcr.io/openhands/agent-server:37a17aa-python-arm64
ghcr.io/openhands/agent-server:37a17aa-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-arm64
ghcr.io/openhands/agent-server:37a17aa-golang
ghcr.io/openhands/agent-server:37a17aa-java
ghcr.io/openhands/agent-server:37a17aa-python

About Multi-Architecture Support

  • Each variant tag (e.g., 37a17aa-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 37a17aa-python-amd64) are also available if needed

Docs

simonrosenberg and others added 3 commits March 16, 2026 11:48
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 16, 2026

Python API breakage checks — ✅ PASSED

Result:PASSED

Action log

@github-actions
Copy link
Contributor

github-actions bot commented Mar 16, 2026

REST API breakage checks (OpenAPI) — ✅ PASSED

Result:PASSED

Action log

@github-actions
Copy link
Contributor

github-actions bot commented Mar 16, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-agent-server/openhands/agent_server
   api.py1681292%74, 86, 101, 107, 278, 281, 285–287, 289, 295, 336
   conversation_router.py113992%264, 322–325, 337–340
   conversation_router_acp.py37197%107
   conversation_service.py4309577%122–123, 150, 153, 155, 162–168, 196, 203, 224, 323, 329, 334, 340, 348–349, 358–361, 370, 384–386, 393, 426–427, 466, 469, 486–490, 492–493, 496–497, 500–505, 587, 594–598, 601–602, 606–610, 613–614, 618–622, 625–626, 632–637, 644–645, 649, 651–652, 657–658, 664–665, 672–673, 677–679, 697, 721, 953, 956
   event_service.py3208174%55–56, 74–76, 85–89, 92–95, 115, 219, 236, 290–291, 295, 303, 306, 352–353, 369, 371, 375–377, 381, 390–391, 393, 397, 403, 405, 413–418, 555, 557–558, 562, 576–578, 580, 584–587, 591–594, 602–605, 625, 629–634, 646–647, 649–650, 657–658, 660–661, 665, 671, 688–689
openhands-sdk/openhands/sdk/agent
   acp_agent.py3897879%200–202, 256–259, 261–262, 289, 291, 295, 301, 312–313, 318, 385, 487–488, 499, 504, 535, 545, 550, 561–564, 570–572, 575–577, 579, 581–582, 584, 586, 591, 600–601, 605–606, 610, 617–623, 633–638, 640, 649–651, 654–655, 661–665, 667, 669–670, 678, 715, 719–720, 963–964
   base.py1872288%200, 257–259, 289, 293–297, 345–347, 357, 367, 375–376, 480, 517–518, 528–529
openhands-sdk/openhands/sdk/conversation/impl
   local_conversation.py4002693%288, 293, 321, 364, 382, 398, 463, 641–642, 645, 797, 805, 807, 811–812, 823, 825–827, 852, 924, 1050, 1054, 1124, 1131–1132
   remote_conversation.py61310483%77, 79, 150, 177, 190, 192–195, 205, 227–228, 233–236, 319, 329–331, 337, 378, 520–523, 525, 545–549, 554–557, 560, 572–576, 732–733, 737–738, 752, 776–777, 796, 807–808, 828–831, 833–834, 858–860, 863–867, 869–870, 874, 876–884, 886, 923, 1053, 1121–1122, 1126, 1131–1135, 1141–1147, 1160–1161, 1247, 1254, 1260–1261, 1339–1340
TOTAL20838517375% 

Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Taste Rating: Acceptable

Overall Assessment: Clean separation of contracts (v1 vs ACP) with good backward compatibility strategy. The data structure choices (discriminated unions with Tags) are solid. However, this PR changes agent behavior in ways that will definitely affect eval benchmarks, so it needs human review before merge.

Key Insight: The architecture correctly avoids the "widen the base contract and break everyone" trap by introducing parallel ACP endpoints. But the FinishAction-every-turn semantic change means this PR should not be auto-approved - someone needs to validate benchmark impact.

Verdict: 🔴 Needs human review (eval behavior changes)

Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Taste Rating: Acceptable

Overall Assessment: Clean separation of contracts (v1 vs ACP) with good backward compatibility strategy. The data structure choices (discriminated unions with Tags) are solid. However, this PR changes agent behavior in ways that will definitely affect eval benchmarks, so it needs human review before merge.

Key Insight: The architecture correctly avoids the "widen the base contract and break everyone" trap by introducing parallel ACP endpoints. But the ACPAgent's turn-based semantics (emitting FinishAction after each step) is a fundamental behavioral difference from the base Agent that must be verified against benchmarks before shipping.


Critical Review Points

🟠 Important - Eval Risk (.github/workflows/run-eval.yml)

This PR adds agent_type parameter to the eval workflow, which is good infrastructure. However, ACPAgent has fundamentally different step semantics than the base Agent (emits FinishAction after each turn, handles tool calls differently via ACP protocol).

Per repo guidelines: "Do NOT submit an APPROVE review when the PR changes agent behavior or anything that could plausibly affect benchmark/evaluation performance."

Action Required: Human maintainer should run lightweight evals comparing ACPAgent vs base Agent on a small benchmark subset (e.g., 10 SWE-bench instances) before merging. Add the integration-test label to trigger benchmark runs.


🟢 Acceptable - Data Structure (openhands-agent-server/openhands/agent_server/models.py)

Using Discriminator with Tag annotations for ACPEnabledAgent is the right choice for polymorphic agent types. The explicit kind field makes deserialization unambiguous and plays well with OpenAPI schema generation.

This is cleaner than the alternative approaches (manual validation, union without discriminator) and follows Pydantic best practices.


🟡 Suggestion - Type Safety (openhands-agent-server/openhands/agent_server/conversation_service.py)

The cast() usage in search_conversations (line 172) is necessary given the contract constraints, but it's a bit fragile. The type checker trusts you that all items are ConversationInfo.

Alternative: Return a typed tuple from _search_conversations, but this creates its own complexity with union return types. The current approach is pragmatic - just be aware the cast bypasses type safety.


🟢 Acceptable - Semantic Difference (openhands-sdk/openhands/sdk/agent/acp_agent.py)

Emitting FinishAction after each step() is a documented behavioral difference from the base Agent. For ACP, one step = one complete remote assistant turn, so this makes sense.

Good: This was called out in a previous review and is now documented both at the class level and inline.

Eval Impact: This changes the event stream compared to base Agent, which could affect:

  • Metrics counting (more finish events)
  • Downstream consumers expecting different event patterns
  • Benchmark evaluation logic

Must be verified in evals before shipping.


🟡 Suggestion - Test Coverage (tests/agent_server/test_conversation_router_acp.py)

The tests verify the OpenAPI contract (request/response schemas) but don't test real ACP behavior. They mock conversation_service.start_acp_conversation and assert it was called.

What's missing: An integration test that:

  1. Starts a real (or fake-but-realistic) ACP subprocess
  2. Creates a conversation via the API
  3. Sends a message and runs to completion
  4. Verifies the event stream contains expected ACPToolCallEvents + FinishAction

The example 09_acp_agent_with_remote_runtime.py provides some E2E coverage, but in-tree integration tests would be better for catching regressions.

Acceptable for now if this is meant as a fast follow-up for integration tests, but flag it as tech debt.


Verdict

Needs human review + lightweight evals - Not because the code is broken, but because agent behavior changes require eval verification per repo policy.

Next Steps:

  1. Add integration-test label to trigger benchmark runs
  2. Run lightweight eval comparing ACPAgent vs base Agent on ~10 SWE-bench instances
  3. Review benchmark results for unexpected regressions
  4. If benchmarks look good, merge with confidence

Copy link
Collaborator

enyst commented Mar 16, 2026

(OpenHands-GPT-5.4)

I reviewed the PR description and diff end-to-end. Overall, the route split looks like the right way to replay #2190 without widening the legacy /api/conversations contract.

One thing I noticed that seems worth clarifying:

  • POST/GET/batch/search got ACP-capable counterparts under /api/acp/conversations, which makes sense because those are the schema-sensitive operations.
  • But /api/conversations/count is not schema-sensitive, and in this PR it now only counts legacy Agent conversations because ConversationService.count_conversations() hardcodes include_acp=False.
  • I also don’t see a parallel /api/acp/conversations/count route.

So after this change, ACP conversations appear to become invisible to the only count endpoint we expose, even though counting itself does not require the legacy Agent-only schema.

Was that exclusion intentional?

If not, I think the simplest fix would be to keep /api/conversations/count inclusive of both legacy and ACP conversations (or add an ACP count endpoint if you want strict surface symmetry). Right now the split feels slightly inconsistent: the schema-sensitive reads were handled carefully, but the scalar count endpoint seems to have inherited the filtering anyway.

Copy link
Collaborator

enyst commented Mar 16, 2026

(OpenHands-GPT-5.4)

Here’s a fuller review summary of PR #2465.

Overall assessment

I think the core design is good.

It directly addresses the real regression from #2190:

  • keep /api/conversations backward-compatible and Agent-only
  • avoid widening the legacy AgentBase REST contract
  • add a parallel ACP-capable contract only where schema sensitivity actually mattered

That is a much cleaner approach than reintroducing ACP support by changing the existing AgentBase boundary again.

What the PR does

1. Adds a parallel ACP conversation API

New routes under:

  • POST /api/acp/conversations
  • GET /api/acp/conversations/{conversation_id}
  • GET /api/acp/conversations
  • GET /api/acp/conversations/search

These are the four operations that were affected by the AgentBase schema widening before.

2. Keeps legacy v1 REST contract stable

The existing /api/conversations routes stay pinned to:

  • StartConversationRequest.agent: Agent
  • ConversationInfo.agent: Agent

So legacy clients can still:

  • omit agent.kind
  • keep seeing the old Agent-shaped response schema

3. Reintroduces ACPAgent remote-runtime support

The PR replays the useful ACP runtime work from #2190, including:

  • ACPAgent restore/resume behavior
  • ACP subprocess lifecycle cleanup
  • Docker image provisioning for ACP servers
  • examples
  • eval workflow plumbing

4. Updates RemoteConversation path selection

RemoteConversation now chooses paths based on agent type:

  • ACPAgent uses /api/acp/conversations for creation/info reads
  • actions/events still use legacy /api/conversations/...

That matches the stated goal: only switch the endpoints whose schemas differ.

SDK remote conversation

Main file:

  • openhands-sdk/openhands/sdk/conversation/impl/remote_conversation.py

What changed:

  • introduced separate base paths:
    • LEGACY_CONVERSATIONS_PATH = "/api/conversations"
    • ACP_CONVERSATIONS_PATH = "/api/acp/conversations"
  • create/get/info calls switch based on whether the local agent is ACPAgent
  • action/event endpoints remain on legacy routes
  • remote agent deserialization now explicitly handles ACPAgent payloads

This is the key SDK-side change that makes the split usable.

New public REST surface

New public paths:

  • /api/acp/conversations
  • /api/acp/conversations/{conversation_id}
  • /api/acp/conversations/search

These are additive, so that part is okay from a compatibility perspective.

What looks good

1. It fixes the original compatibility problem the right way

Instead of making AgentBase wider everywhere, it isolates the polymorphic contract.

2. The OpenAPI intent is explicitly tested

tests/agent_server/test_openapi_discriminator.py checks the key invariant:

  • v1 routes stay Agent-only
  • ACP routes expose the union

That’s the right test to have.

3. SDK routing change is narrow

RemoteConversation only switches where it has to.

4. Webhook compatibility was handled thoughtfully

The follow-up fix in the PR makes the behavior much more reasonable.

Risks / concerns / follow-ups

1. Count endpoint inconsistency

This is the comment I already posted.

2. Possible attach/reuse edge case with mismatched contract

From the code, it looks like this edge case may be problematic:

  • client uses legacy Agent
  • supplies conversation_id that already belongs to an ACP conversation
  • legacy GET path returns 404
  • create path reuses existing event service by ID
  • _compose_conversation_info_v1() asserts stored.agent is an Agent

That suggests a potential bad path for “legacy client tries to attach to existing ACP conversation ID”.

This may be an unsupported case, but if so it would be better to reject it explicitly than fail via assertion.

3. No obvious docs update

Per the package guidance, this probably deserves documentation updates for:

  • new /api/acp/conversations endpoints
  • ACP remote runtime usage
  • RemoteConversation contract selection behavior

I didn’t see docs changes in this PR.

4. Model duplication maintenance cost

ConversationInfo no longer inherits ConversationState; instead the PR manually mirrors lots of fields into _ConversationInfoBase.

That’s understandable for schema control, but it creates a maintenance risk:

  • future ConversationState field additions may need to be mirrored manually into both conversation info shapes

5. ACP behavior changes still need human eval judgment

I agree with the existing all-hands-bot caution here:

That does not make the PR wrong, but it does make benchmark validation worthwhile before merge.

Test coverage impression

What I did not notice strong coverage for:

  • ACP/legacy attach-by-existing-ID mismatch
  • count semantics across legacy + ACP conversations

@enyst
Copy link
Collaborator

enyst commented Mar 16, 2026

HUMAN: one tiny thing: maybe we shouldn’t use “legacy” for the name of the Agent endpoints; they’re not legacy, they’re … standard? regular? 😅

Update: actually, not sure. Since the new endpoints encode the discriminator, maybe we could use them at some point going forward, instead of the old ones? That would mean to deprecate the existing endpoints for a time, then remove them, and keep only the acp/ ones. (if I saw this right that it will work for regular conversations).

🤔 But unless we have strong feelings about this, I think... I think right now I wouldn't. The existing endpoints work, they're essential, they're well known, they don't have acp in the name if they're not acp.

@simonrosenberg
Copy link
Collaborator Author

ACP Validation Results ✅

Successfully validated the ACP implementation with benchmarks using:

  • SDK: feat/acp-remote-runtime-acp-endpoints
  • Evaluation: feat/acp-agent
  • Benchmarks: feat/acp-agent

Results

Benchmark Status Instances Resolve Rate
swebench ✅ Complete 5/5 100%
swtbench ✅ Complete 5/5 80%
commit0 ⚠️ Partial 3/5 60% (infra issues)

ACP Evidence

  • ACP tool calls logged: (acp:7) to (acp:53) per instance
  • workspace_keepalive working correctly
  • Claude ACP settings written to ~/.claude/settings.json
  • Remote conversations completed via WebSocket

commit0 Note

The 2 incomplete commit0 instances failed due to MCP JSONRPC parsing failures (infrastructure issue), not ACP bugs:

pydantic_core ValidationError: Invalid JSON: EOF while parsing a value

This is a runtime pod stability issue unrelated to the ACP implementation.

Workflow Run IDs

Conclusion: The ACP implementation is validated and working correctly. ✅

Copy link
Collaborator

enyst commented Mar 16, 2026

Following up on point 4 from my previous summary (“Model duplication maintenance cost”): I think there is a lower-duplication alternative here if the main goal is just to keep the REST agent contract split without hand-mirroring most of ConversationState.

Instead of introducing _ConversationInfoBase and copying the common fields out of ConversationState, I think you could keep the response DTOs as subclasses of ConversationState and only narrow/override the agent field plus add the server metadata fields:

class ConversationInfo(ConversationState):
    agent: Agent
    title: str | None = None
    metrics: MetricsSnapshot | None = None
    created_at: datetime
    updated_at: datetime

class ACPConversationInfo(ConversationState):
    agent: ACPEnabledAgent
    title: str | None = None
    metrics: MetricsSnapshot | None = None
    created_at: datetime
    updated_at: datetime

I sanity-checked this locally in a small schema probe, and Pydantic does narrow the inherited agent field in the generated schema (Agent for the legacy shape, union for the ACP-capable shape), while still reusing the rest of the ConversationState fields.

That would keep the important separation:

  • internal/runtime state still uses AgentBase
  • legacy REST response stays Agent
  • ACP REST response stays polymorphic

but avoids having to manually re-copy the rest of the conversation-state surface into _ConversationInfoBase.

Tradeoff: this keeps the API DTOs more tightly coupled to ConversationState, so if the intent here was to deliberately decouple the REST shape from the runtime state model, then the current approach is more explicit. But if the main concern is just the agent contract split, subclassing ConversationState still seems like a simpler option worth considering.

@enyst
Copy link
Collaborator

enyst commented Mar 16, 2026

HUMAN: just a quick pass through the agent's posts, it seems maybe we want to look at these just to be sure it is what we want: @simonrosenberg

  1. Count endpoint inconsistency

I'm not sure it's an issue, WDYT? #2465 (comment)

  1. Possible attach/reuse edge case with mismatched contract

The code suggests a potential bad path for “regular client tries to attach to existing ACP conversation ID”.

Maybe we could reject it explicitly (with an error) than fail via assertion.

  1. No obvious docs update

This is OK, we'll deal with it

  1. Model duplication maintenance cost

WDYT of this? #2465 (comment)

  1. eval

This is OK

@simonrosenberg
Copy link
Collaborator Author

@enyst
1 - I am adding /api/acp/conversations/count
2. yes agreed
3. agreed just a PR on docs repo
4. Unclear to me right now. I can make an issue for a future PR?
5. eval is covered: I triggered benchmark runs

@simonrosenberg
Copy link
Collaborator Author

@enyst 1 - I am adding /api/acp/conversations/count 2. yes agreed 3. agreed just a PR on docs repo 4. Unclear to me right now. I can make an issue for a future PR? 5. eval is covered: I triggered benchmark runs

I implemented 1 and 2. Making a PR in the docs right now + making an issue for 4.

Also I am triggering small benchmarks with the latest code

Copy link
Collaborator

@enyst enyst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this!

Co-authored-by: openhands <openhands@all-hands.dev>
@simonrosenberg simonrosenberg merged commit d129025 into main Mar 16, 2026
25 of 26 checks passed
@simonrosenberg simonrosenberg deleted the feat/acp-remote-runtime-acp-endpoints branch March 16, 2026 20:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants