Skip to content

add tool approval UI and E2E test infrastructure#16

Open
bendrucker wants to merge 3 commits intopydantic:mainfrom
bendrucker:e2e-test
Open

add tool approval UI and E2E test infrastructure#16
bendrucker wants to merge 3 commits intopydantic:mainfrom
bendrucker:e2e-test

Conversation

@bendrucker
Copy link

@bendrucker bendrucker commented Mar 15, 2026

  • Upgrades to AI SDK v6
  • Adds tool approval UI using the AI SDK Elements Confirmation component
  • Adds E2E test coverage with Playwright

This is a large PR but given the tight coupling I kept it together rather than try to add E2E testing first and then upgrade to v6. If you want me to try to split it for easier review I can. The meaningful diff is ~30 files, ~1100 lines. The rest is lock files and vendored AI Elements/shadcn component upgrades.

Changes

AI SDK Upgrade

Upgrades ai from v5 to v6 and @ai-sdk/react from v2 to v3. Adds radix-ui and shiki as new dependencies. Migrates Chat.tsx to use DefaultChatTransport for request body configuration and sendAutomaticallyWhen: lastAssistantMessageIsCompleteWithApprovalResponses for automatic tool approval flow continuation.

Tool Approval UI

Replaces custom inline approve/deny buttons in Part.tsx with the AI SDK Elements Confirmation component, scaffolded via npx shadcn@latest add @ai-elements/confirmation. The component uses React context to manage approval state and conditionally renders request, accepted, and rejected states. Adds alert.tsx as a shadcn/ui dependency.

Tool Call Rendering

Updates Part.tsx to properly handle dynamic-tool parts with toolName and custom icons via getToolIcon. Tool cards now auto-expand (defaultOpen) when in approval-requested state. Upgrades the vendored tool.tsx AI Element to include status labels and icons for all tool states including approval-requested, approval-responded, output-denied, and output-error. Adds dialog-based error display for tool errors. Adds data-tool-name attribute to tool cards for robust test selectors.

Test Infrastructure

  • Adds Playwright for E2E tests and Vitest for headless unit tests
  • Creates a deterministic Python test server (tests/server/) using pydantic-ai's FunctionModel
  • Adds real LLM models to test server (haiku, gpt-4.1-nano, gemini-2.0-flash)
  • Shared test modules organized by domain: conversation.ts, sidebar.ts, tools.ts
  • Playwright config driven by env vars: E2E_TEST_DIR (test directory), E2E_VIDEO (recording)

E2E Test Coverage and CI

Deterministic tests (tests/e2e/deterministic/) run on every PR against a FunctionModel test server with predictable responses. Coverage spans core messaging, model selection, conversation lifecycle (persistence, switching, deletion), tool execution (single, parallel, error recovery), and tool approval flows.

LLM tests (tests/e2e/llm/) verify real streaming from Anthropic, OpenAI, and Google. These run only via workflow_dispatch to avoid API costs, with provider API keys from repository secrets.

References

Replace custom approval buttons with AI SDK Elements Confirmation
component. Add Playwright E2E tests covering tool approval accept/deny
flows, chat, sidebar, model selection, and tool calls. Upgrade ai SDK
from v5 to v6.
@bendrucker bendrucker changed the title feat: add tool approval UI and E2E test infrastructure add tool approval UI and E2E test infrastructure Mar 15, 2026
…lifecycle

Add three new test suites covering gaps in E2E coverage:
- error-handling: error dialog with details, recovery text
- multi-tool: parallel tool completion, results, final text
- conversation-lifecycle: persistence across reload, switching,
  active/inactive deletion

Refactor test infrastructure:
- Centralize expect timeout (5s) in playwright config, remove 34 inline overrides
- Add data-tool-name attribute to Tool component for robust selectors
- Organize helpers by domain: conversation.ts, sidebar.ts, tools.ts
- Extract shared locators (toolCard, sidebar, chat) and actions
  (sendMessage, waitForPersisted)
- Move deterministic tests to tests/e2e/deterministic/
- Add tests/e2e/llm/ with real provider tests (anthropic, openai, google)
- Add LLM models to test server (haiku, gpt-4.1-nano, gemini-2.0-flash)
- Replace Playwright projects with env var config (E2E_TEST_DIR, E2E_VIDEO)
- Remove placeholder e2e-llm workflow, add LLM step to main CI
- Remove committed __pycache__, add to .gitignore
- Document test commands in CLAUDE.md
@bendrucker bendrucker marked this pull request as ready for review March 16, 2026 02:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant