add tool approval UI and E2E test infrastructure#16
Open
bendrucker wants to merge 3 commits intopydantic:mainfrom
Open
add tool approval UI and E2E test infrastructure#16bendrucker wants to merge 3 commits intopydantic:mainfrom
bendrucker wants to merge 3 commits intopydantic:mainfrom
Conversation
Replace custom approval buttons with AI SDK Elements Confirmation component. Add Playwright E2E tests covering tool approval accept/deny flows, chat, sidebar, model selection, and tool calls. Upgrade ai SDK from v5 to v6.
This was referenced Mar 16, 2026
…lifecycle Add three new test suites covering gaps in E2E coverage: - error-handling: error dialog with details, recovery text - multi-tool: parallel tool completion, results, final text - conversation-lifecycle: persistence across reload, switching, active/inactive deletion Refactor test infrastructure: - Centralize expect timeout (5s) in playwright config, remove 34 inline overrides - Add data-tool-name attribute to Tool component for robust selectors - Organize helpers by domain: conversation.ts, sidebar.ts, tools.ts - Extract shared locators (toolCard, sidebar, chat) and actions (sendMessage, waitForPersisted)
- Move deterministic tests to tests/e2e/deterministic/ - Add tests/e2e/llm/ with real provider tests (anthropic, openai, google) - Add LLM models to test server (haiku, gpt-4.1-nano, gemini-2.0-flash) - Replace Playwright projects with env var config (E2E_TEST_DIR, E2E_VIDEO) - Remove placeholder e2e-llm workflow, add LLM step to main CI - Remove committed __pycache__, add to .gitignore - Document test commands in CLAUDE.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
ConfirmationcomponentThis is a large PR but given the tight coupling I kept it together rather than try to add E2E testing first and then upgrade to v6. If you want me to try to split it for easier review I can. The meaningful diff is ~30 files, ~1100 lines. The rest is lock files and vendored AI Elements/shadcn component upgrades.
Changes
AI SDK Upgrade
Upgrades
aifrom v5 to v6 and@ai-sdk/reactfrom v2 to v3. Addsradix-uiandshikias new dependencies. MigratesChat.tsxto useDefaultChatTransportfor request body configuration andsendAutomaticallyWhen: lastAssistantMessageIsCompleteWithApprovalResponsesfor automatic tool approval flow continuation.Tool Approval UI
Replaces custom inline approve/deny buttons in
Part.tsxwith the AI SDK ElementsConfirmationcomponent, scaffolded vianpx shadcn@latest add @ai-elements/confirmation. The component uses React context to manage approval state and conditionally renders request, accepted, and rejected states. Addsalert.tsxas a shadcn/ui dependency.Tool Call Rendering
Updates
Part.tsxto properly handledynamic-toolparts withtoolNameand custom icons viagetToolIcon. Tool cards now auto-expand (defaultOpen) when inapproval-requestedstate. Upgrades the vendoredtool.tsxAI Element to include status labels and icons for all tool states includingapproval-requested,approval-responded,output-denied, andoutput-error. Adds dialog-based error display for tool errors. Addsdata-tool-nameattribute to tool cards for robust test selectors.Test Infrastructure
tests/server/) usingpydantic-ai'sFunctionModelconversation.ts,sidebar.ts,tools.tsE2E_TEST_DIR(test directory),E2E_VIDEO(recording)E2E Test Coverage and CI
Deterministic tests (
tests/e2e/deterministic/) run on every PR against aFunctionModeltest server with predictable responses. Coverage spans core messaging, model selection, conversation lifecycle (persistence, switching, deletion), tool execution (single, parallel, error recovery), and tool approval flows.LLM tests (
tests/e2e/llm/) verify real streaming from Anthropic, OpenAI, and Google. These run only viaworkflow_dispatchto avoid API costs, with provider API keys from repository secrets.References