Add AI SDK E2E integration tests by bendrucker · Pull Request #4390 · pydantic/pydantic-ai

bendrucker · 2026-02-20T22:33:25Z

E2E integration tests for AI SDK v6 against a real VercelAIAdapter server, covering text, thinking, tool calls, multi-tool, and the full tool approval lifecycle.

Three of the four approval tests reproduce #4387 and fail without #4388.

Note: This branch is based on fix-ai-sdk-v6-approval-types. I'll rebase onto main after #4388 merges to remove those changes from the diff.

Changes

tests/ai_sdk/server.py: Per-agent Starlette server with a single /api/chat route, selected via CLI arg
tests/ai_sdk/helpers.ts: TestChat wrapping AI SDK's DefaultChatTransport at /api/chat
tests/ai_sdk/test_*.ts: TypeScript tests using AbstractChat for each scenario
tests/ai_sdk/test_ai_sdk.py: pytest orchestration that starts a server per test, with glob-based test discovery and agent/test file validation
tests/ai_sdk/package.json: ai and @types/node dependencies

Testing

uv run pytest tests/ai_sdk/test_ai_sdk.py -xvs

References

Pre-Review Checklist

Any AI generated code has been reviewed line-by-line by the human PR author, who stands by it.
No breaking changes in accordance with the version policy.
Linting and type checking pass per make format and make typecheck.
PR title is fit for the release changelog.

Pre-Merge Checklist

New tests for any fix or new behavior, maintaining 100% coverage.
Updated documentation for new features and behaviors, including docstrings for API docs.

bendrucker · 2026-02-20T23:30:18Z

Putting this up for discussion before I spend more time cleaning up any of the messy test lifecycle bits, especially server startup/teardown.

The AI SDK v6 tool approval flow uses three additional tool part states (approval-requested, approval-responded, output-denied) that were not defined in request_types.py. When the SDK client sent messages with these states, Pydantic validation rejected them. Add 6 new models (3 static + 3 dynamic) for the missing states, update the ToolUIPart/DynamicToolUIPart unions, register them in _TOOL_PART_TYPES, and handle output-denied as a terminal state in load_messages().

Narrow the isinstance check to only approval-responded part types. Output-denied parts are already materialized by load_messages and must not be re-processed as deferred results.

Reproduces pydantic#4387: the Pydantic AI server rejects messages with approval-requested, approval-responded, and output-denied tool states that the AI SDK v6 client produces during the tool approval lifecycle. The harness starts a real Starlette server with VercelAIAdapter and drives it from a node:test script using the AI SDK's AbstractChat.

Rename tests/js_integration/ to tests/ai_sdk/, convert to TypeScript, and add test cases for approval, denial, and denial with reason.

…narios Split tests into separate files per scenario with shared helpers. Use TestModel for natural agent behavior instead of hand-crafted stream deltas. Exclude tests/ai_sdk from normal pytest collection via norecursedirs.

- Use filter+find instead of for loop in tool approval test - Use every() instead of filter+count in multi-tool test - Use Array.prototype.with() in replaceMessage

Server takes agent name as CLI arg and serves it at /api/chat. TestChat hardcodes the transport URL, so tests just use new TestChat().

- server_url fixture is now function-scoped, starts a server per test with the agent name derived from the test filename - Discover test files via glob instead of hardcoded list - Add test_agents_match_test_files to catch mismatches

Make the approval agent retry after first denial so multi-step denial paths are exercised. Add tests for deny-retry-approve and deny-retry-deny flows. These tests currently fail, reproducing the bug where iter_tool_approval_responses yields output-denied parts.

DouweM · 2026-03-04T23:09:41Z

@bendrucker Nice! I agree something like this is very much worth having.

Would it make sense to live in or use https://github.com/pydantic/ai-chat-ui, though? That's the frontend we use for https://ai.pydantic.dev/web/. We could have a CI task in this pydantic-ai repo that triggers a build on that repo pointing at the latest pydantic-ai.

But I'm not a frontend developer and there may be a good reason why that won't work right, and this should really live on the Pydantic AI side using a separate test harness than that real app.

bendrucker · 2026-03-05T02:22:39Z

Would it make sense to live in or use https://github.com/pydantic/ai-chat-ui, though?

Yeah I could see that. That then opens up 2 avenues for testing:

Running headless tests in Node.js (this)
Automated browser testing of the AI Chat UI app (https://playwright.dev/)

If this moves to the ai-chat-ui repo, 1 can slim down and focus on the more specific integration tests (e.g., various combinations of tool approval ordering) while 2 covers the overall "does it work."

DouweM · 2026-03-06T22:14:27Z

@bendrucker That sounds reasonable, would you be up for implementing (some subset of) that there? :)

bendrucker · 2026-03-16T01:07:00Z

Yes, on it!

Opened pydantic/ai-chat-ui#16, and #4670 to point to it.

github-actions bot added size: M Medium PR (101-500 weighted lines) chore labels Feb 20, 2026

bendrucker mentioned this pull request Feb 20, 2026

Add missing Vercel AI SDK v6 tool approval part types #4388

Merged

6 tasks

bendrucker force-pushed the ai-sdk-test-e2e branch from 5e53a77 to 3042899 Compare February 20, 2026 22:47

bendrucker changed the title ~~Add AI SDK E2E integration tests for tool approval lifecycle~~ Add AI SDK E2E integration tests Feb 20, 2026

bendrucker mentioned this pull request Feb 20, 2026

Vercel AI Elements build_run_input failing on deferred tool approval/denial. #4387

Closed

2 tasks

github-actions bot added size: L Large PR (501-1500 weighted lines) and removed size: M Medium PR (101-500 weighted lines) labels Feb 24, 2026

bendrucker force-pushed the ai-sdk-test-e2e branch from c45a0ce to 53bc1e5 Compare February 24, 2026 06:14

bendrucker added 13 commits February 26, 2026 20:11

fix: sort imports in _utils.py

022e39a

Simplify tool approval tests

31b0eca

Fix iter_tool_approval_responses matching output-denied parts

afe69af

Narrow the isinstance check to only approval-responded part types. Output-denied parts are already materialized by load_messages and must not be re-processed as deferred results.

Add AI SDK E2E integration tests for tool approval lifecycle

ff13ffa

Rename tests/js_integration/ to tests/ai_sdk/, convert to TypeScript, and add test cases for approval, denial, and denial with reason.

Fix ruff lint and formatting

389e1ab

Simplify TypeScript test helpers and assertions

11005cb

- Use filter+find instead of for loop in tool approval test - Use every() instead of filter+count in multi-tool test - Use Array.prototype.with() in replaceMessage

Serve single /api/chat route per agent

ec0b60f

Server takes agent name as CLI arg and serves it at /api/chat. TestChat hardcodes the transport URL, so tests just use new TestChat().

Per-test server and glob-based test discovery

bdca1be

- server_url fixture is now function-scoped, starts a server per test with the agent name derived from the test filename - Discover test files via glob instead of hardcoded list - Add test_agents_match_test_files to catch mismatches

Fix pyright errors in server.py type annotations

78a660a

bendrucker force-pushed the ai-sdk-test-e2e branch from 53bc1e5 to 78a660a Compare February 27, 2026 04:15

This was referenced Mar 15, 2026

Bump chat UI to v2 with SDK v6 protocol #4670

Draft

add tool approval UI and E2E test infrastructure pydantic/ai-chat-ui#16

Open

bendrucker closed this Mar 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add AI SDK E2E integration tests#4390

Add AI SDK E2E integration tests#4390
bendrucker wants to merge 13 commits intopydantic:mainfrom
bendrucker:ai-sdk-test-e2e

bendrucker commented Feb 20, 2026 •

edited

Loading

Uh oh!

bendrucker commented Feb 20, 2026

Uh oh!

DouweM commented Mar 4, 2026

Uh oh!

bendrucker commented Mar 5, 2026

Uh oh!

DouweM commented Mar 6, 2026

Uh oh!

bendrucker commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bendrucker commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Testing

References

Pre-Review Checklist

Pre-Merge Checklist

Uh oh!

bendrucker commented Feb 20, 2026

Uh oh!

DouweM commented Mar 4, 2026

Uh oh!

bendrucker commented Mar 5, 2026

Uh oh!

DouweM commented Mar 6, 2026

Uh oh!

bendrucker commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bendrucker commented Feb 20, 2026 •

edited

Loading