Add AI SDK E2E integration tests#4390
Conversation
5e53a77 to
3042899
Compare
|
Putting this up for discussion before I spend more time cleaning up any of the messy test lifecycle bits, especially server startup/teardown. |
c45a0ce to
53bc1e5
Compare
The AI SDK v6 tool approval flow uses three additional tool part states (approval-requested, approval-responded, output-denied) that were not defined in request_types.py. When the SDK client sent messages with these states, Pydantic validation rejected them. Add 6 new models (3 static + 3 dynamic) for the missing states, update the ToolUIPart/DynamicToolUIPart unions, register them in _TOOL_PART_TYPES, and handle output-denied as a terminal state in load_messages().
Narrow the isinstance check to only approval-responded part types. Output-denied parts are already materialized by load_messages and must not be re-processed as deferred results.
Reproduces pydantic#4387: the Pydantic AI server rejects messages with approval-requested, approval-responded, and output-denied tool states that the AI SDK v6 client produces during the tool approval lifecycle. The harness starts a real Starlette server with VercelAIAdapter and drives it from a node:test script using the AI SDK's AbstractChat.
Rename tests/js_integration/ to tests/ai_sdk/, convert to TypeScript, and add test cases for approval, denial, and denial with reason.
…narios Split tests into separate files per scenario with shared helpers. Use TestModel for natural agent behavior instead of hand-crafted stream deltas. Exclude tests/ai_sdk from normal pytest collection via norecursedirs.
- Use filter+find instead of for loop in tool approval test - Use every() instead of filter+count in multi-tool test - Use Array.prototype.with() in replaceMessage
Server takes agent name as CLI arg and serves it at /api/chat. TestChat hardcodes the transport URL, so tests just use new TestChat().
- server_url fixture is now function-scoped, starts a server per test with the agent name derived from the test filename - Discover test files via glob instead of hardcoded list - Add test_agents_match_test_files to catch mismatches
Make the approval agent retry after first denial so multi-step denial paths are exercised. Add tests for deny-retry-approve and deny-retry-deny flows. These tests currently fail, reproducing the bug where iter_tool_approval_responses yields output-denied parts.
53bc1e5 to
78a660a
Compare
|
@bendrucker Nice! I agree something like this is very much worth having. Would it make sense to live in or use https://github.com/pydantic/ai-chat-ui, though? That's the frontend we use for https://ai.pydantic.dev/web/. We could have a CI task in this pydantic-ai repo that triggers a build on that repo pointing at the latest pydantic-ai. But I'm not a frontend developer and there may be a good reason why that won't work right, and this should really live on the Pydantic AI side using a separate test harness than that real app. |
Yeah I could see that. That then opens up 2 avenues for testing:
If this moves to the ai-chat-ui repo, 1 can slim down and focus on the more specific integration tests (e.g., various combinations of tool approval ordering) while 2 covers the overall "does it work." |
|
@bendrucker That sounds reasonable, would you be up for implementing (some subset of) that there? :) |
|
Yes, on it! Opened pydantic/ai-chat-ui#16, and #4670 to point to it. |
E2E integration tests for AI SDK v6 against a real
VercelAIAdapterserver, covering text, thinking, tool calls, multi-tool, and the full tool approval lifecycle.Three of the four approval tests reproduce #4387 and fail without #4388.
Changes
tests/ai_sdk/server.py: Per-agent Starlette server with a single/api/chatroute, selected via CLI argtests/ai_sdk/helpers.ts:TestChatwrapping AI SDK'sDefaultChatTransportat/api/chattests/ai_sdk/test_*.ts: TypeScript tests usingAbstractChatfor each scenariotests/ai_sdk/test_ai_sdk.py: pytest orchestration that starts a server per test, with glob-based test discovery and agent/test file validationtests/ai_sdk/package.json:aiand@types/nodedependenciesTesting
References
build_run_inputfailing on deferred tool approval/denial. #4387Pre-Review Checklist
make formatandmake typecheck.Pre-Merge Checklist