Skip to content

Deflake chat input prompt entry in E2E#3186

Open
wwwillchen wants to merge 5 commits intodyad-sh:mainfrom
wwwillchen:fix/e2e-chat-input-deflake
Open

Deflake chat input prompt entry in E2E#3186
wwwillchen wants to merge 5 commits intodyad-sh:mainfrom
wwwillchen:fix/e2e-chat-input-deflake

Conversation

@wwwillchen
Copy link
Copy Markdown
Collaborator

@wwwillchen wwwillchen commented Apr 9, 2026

Summary

  • stabilize sendPrompt() by scoping to the chat-input container and using retry-based click/fill verification for Lexical
  • assert prompt text is present and send button is enabled before clicking send to avoid slow-machine focus races
  • add a short settle delay after creating a new chat in chat_tabs.spec.ts to reduce intermittent tab-switch timing flakes

Test plan

  • npm run fmt && npm run lint:fix && npm run ts
  • npm test

πŸ€– Generated with Claude Code

Made with Cursor


Open with Devin

Stabilize sendPrompt by retrying click/fill assertions against a single scoped editor locator and add a short settle wait in chat tab navigation to avoid intermittent prompt entry races on slower machines.

Made-with: Cursor
@wwwillchen
Copy link
Copy Markdown
Collaborator Author

@BugBot run

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request improves the reliability of chat interactions in E2E tests by refactoring the prompt-sending logic to include explicit visibility and state checks. However, a fixed sleep was introduced in the chat tabs test, which should be replaced with a more robust synchronization mechanism to avoid potential flakiness.

Comment thread e2e-tests/chat_tabs.spec.ts Outdated

// Chat 2
await po.chatActions.clickNewChat();
await po.sleep(2_000);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

While fixed timeouts can be used in sequential and predictable flows, they are generally discouraged in E2E tests as they can lead to flakiness. Following the principle of preferring more robust synchronization mechanisms, it is better to wait for a specific condition, such as waiting for content from the previous chat to disappear, to ensure the test is reliable across different environments.

Suggested change
await po.sleep(2_000);
await expect(po.page.getByText("[dump] build a todo app")).not.toBeVisible();
References
  1. In prototypes with sequential and predictable flows, setTimeout can be used for state synchronization. However, prefer more robust mechanisms like state machines for production code.

@wwwillchen
Copy link
Copy Markdown
Collaborator Author

@BugBot run

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

πŸ’‘ Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1a60832fad

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with πŸ‘.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +91 to +93
const chatInput = this.getChatInputContainer().locator(
'[data-lexical-editor="true"]',
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Support home chat container in sendPrompt

sendPrompt() now scopes the editor lookup to chat-input-container, but many E2E flows call this helper from the Apps/Home screen where only home-chat-input-container is rendered (e.g., PageObject.setUp() navigates to Apps before first prompt, and home_chat_existing_app.spec.ts sends from home). In that context expect(chatInput).toBeVisible() will time out, so the helper can no longer send the initial prompt to create/select an app.

Useful? React with πŸ‘Β / πŸ‘Ž.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to reduce Playwright E2E flakes around entering prompts into the Lexical chat input and switching between chats/tabs.

Changes:

  • Updated ChatActions.sendPrompt() to scope to chat-input-container and add retry-based click/fill verification before sending.
  • Added a 2s settle delay after creating a new chat in chat_tabs.spec.ts to reduce intermittent timing issues.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
e2e-tests/helpers/page-objects/components/ChatActions.ts Adds retry-based prompt entry + pre-send assertions, and scopes the Lexical editor to the chat input container.
e2e-tests/chat_tabs.spec.ts Adds a fixed delay after clickNewChat() to reduce tab-switch timing flakes.

πŸ’‘ Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +91 to +95
const chatInput = this.getChatInputContainer().locator(
'[data-lexical-editor="true"]',
);
const sendButton = this.page.getByRole("button", { name: "Send message" });

Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sendPrompt() now hard-scopes the Lexical editor to chat-input-container. That will fail on the Apps/Home page where the prompt input lives under home-chat-input-container (see HomeChatInput using data-testid="home-chat-input-container"). Consider selecting the visible container (chat vs home) at runtime, or adding an explicit location option and scoping both the editor and the Send button to that container to avoid targeting the wrong UI.

Copilot uses AI. Check for mistakes.
Comment on lines +100 to +101
await expect(chatInput).toContainText(prompt);
await expect(sendButton).toBeEnabled();
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inside expect(...).toPass(), the nested Playwright expect(chatInput).toContainText(...) and expect(sendButton).toBeEnabled() each have their own auto-wait timeouts (defaults), which can block a single attempt for seconds and largely defeats the retry loop. Prefer asserting via textContent() / isEnabled() inside the callback, or set explicit short/zero timeouts on the nested expect calls so toPass can retry quickly.

Suggested change
await expect(chatInput).toContainText(prompt);
await expect(sendButton).toBeEnabled();
const chatInputText = await chatInput.textContent();
expect(chatInputText ?? "").toContain(prompt);
expect(await sendButton.isEnabled()).toBe(true);

Copilot uses AI. Check for mistakes.
Comment thread e2e-tests/chat_tabs.spec.ts Outdated

// Chat 2
await po.chatActions.clickNewChat();
await po.sleep(2_000);
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fixed sleep(2_000) will slow the suite and can still be flaky on very slow/fast machines. Prefer waiting on a concrete post-clickNewChat() condition (e.g., chat id changes, input clears/becomes enabled, or the new tab becomes active) using expect(...).toPass()/locator assertions instead of an unconditional delay.

Copilot uses AI. Check for mistakes.
@wwwillchen
Copy link
Copy Markdown
Collaborator Author

@BugBot run

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 2 files

Confidence score: 3/5

  • Some risk due to a concrete user-impacting change: sendPrompt() in e2e-tests/helpers/page-objects/components/ChatActions.ts now scopes the editor to chat-input-container, which can break Home/Apps page flows that send prompts from outside that container.
  • The nested assertions inside toPass() in e2e-tests/helpers/page-objects/components/ChatActions.ts can compound timeouts, increasing E2E flakiness and slow retries, but this is not a production runtime risk.
  • Pay close attention to e2e-tests/helpers/page-objects/components/ChatActions.ts - potential E2E flow breakage and compounded assertion timeouts.
Prompt for AI agents (unresolved issues)

Check if these issues are valid β€” if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="e2e-tests/helpers/page-objects/components/ChatActions.ts">

<violation number="1" location="e2e-tests/helpers/page-objects/components/ChatActions.ts:91">
P1: `sendPrompt()` now scopes the Lexical editor lookup to `getChatInputContainer()` (i.e. `chat-input-container`), but E2E flows that send prompts from the Home/Apps page (e.g. `home_chat_existing_app.spec.ts`, initial `setUp()` prompts) target a different container (`home-chat-input-container`). The `expect(chatInput).toBeVisible()` call will time out in those contexts. Consider selecting the visible container at runtime or accepting an explicit scope parameter.</violation>

<violation number="2" location="e2e-tests/helpers/page-objects/components/ChatActions.ts:100">
P2: The nested web-first assertions (`toContainText`, `toBeEnabled`) inside the `toPass()` callback each carry their own default auto-wait timeout (~5s each). A single retry attempt can block for up to 10s on these assertions, largely defeating the retry loop. Use instant checks (e.g., `await chatInput.textContent()` + synchronous `expect`, and `await sendButton.isEnabled()`) or pass `{ timeout: 0 }` to the nested expects so `toPass` retries quickly.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread e2e-tests/helpers/page-objects/components/ChatActions.ts Outdated
Comment on lines +100 to +101
await expect(chatInput).toContainText(prompt);
await expect(sendButton).toBeEnabled();
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: The nested web-first assertions (toContainText, toBeEnabled) inside the toPass() callback each carry their own default auto-wait timeout (~5s each). A single retry attempt can block for up to 10s on these assertions, largely defeating the retry loop. Use instant checks (e.g., await chatInput.textContent() + synchronous expect, and await sendButton.isEnabled()) or pass { timeout: 0 } to the nested expects so toPass retries quickly.

Prompt for AI agents
Check if this issue is valid β€” if so, understand the root cause and fix it. At e2e-tests/helpers/page-objects/components/ChatActions.ts, line 100:

<comment>The nested web-first assertions (`toContainText`, `toBeEnabled`) inside the `toPass()` callback each carry their own default auto-wait timeout (~5s each). A single retry attempt can block for up to 10s on these assertions, largely defeating the retry loop. Use instant checks (e.g., `await chatInput.textContent()` + synchronous `expect`, and `await sendButton.isEnabled()`) or pass `{ timeout: 0 }` to the nested expects so `toPass` retries quickly.</comment>

<file context>
@@ -88,9 +88,20 @@ export class ChatActions {
+    await expect(async () => {
+      await chatInput.click();
+      await chatInput.fill(prompt);
+      await expect(chatInput).toContainText(prompt);
+      await expect(sendButton).toBeEnabled();
+    }).toPass({ timeout: Timeout.SHORT });
</file context>
Suggested change
await expect(chatInput).toContainText(prompt);
await expect(sendButton).toBeEnabled();
const chatInputText = await chatInput.textContent();
expect(chatInputText ?? "").toContain(prompt);
expect(await sendButton.isEnabled()).toBe(true);
Fix with Cubic

devin-ai-integration[bot]

This comment was marked as resolved.

cursor[bot]

This comment was marked as resolved.

@github-actions github-actions bot added the needs-human:review-issue ai agent flagged an issue that requires human review label Apr 9, 2026
@wwwillchen
Copy link
Copy Markdown
Collaborator Author

@BugBot run

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

πŸ’‘ Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c2d660a9d9

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with πŸ‘.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

const chatInput = this.getChatInput();
const sendButton = this.page.getByRole("button", { name: "Send message" });

await expect(chatInput).toBeVisible();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Apply MEDIUM timeout to initial chat-input visibility wait

The new expect(chatInput).toBeVisible() pre-check uses Playwright’s default assertion timeout (~5s), so on slower CI runs after importApp() it can fail before the toPass({ timeout: Timeout.MEDIUM }) retry logic even starts. Before this commit, click()/fill() used the longer action/test timeout budget, so this change can reintroduce prompt-entry flakes in the same paths this helper is meant to stabilize. Use timeout: Timeout.MEDIUM for this visibility check (or move it inside the retry block) so waits are consistent.

Useful? React with πŸ‘Β / πŸ‘Ž.

const chatInput = this.getChatInput();
const sendButton = this.page.getByRole("button", { name: "Send message" });

await expect(chatInput).toBeVisible();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Visibility assertion uses short default timeout before retry

Low Severity

The new await expect(chatInput).toBeVisible() assertion runs before the toPass retry block and uses Playwright's default 5-second assertion timeout. The subsequent retry block uses Timeout.MEDIUM (15–30s on CI). On slow CI machines, this creates a bottleneck: if the chat input takes more than 5 seconds to render (e.g., after clickNewChat() triggers a page transition), the method throws before the more resilient retry block ever runs. The old code had no such gateβ€”click() would wait using the much longer action/test timeout. Other assertions in this codebase pass explicit timeouts when longer waits are needed (e.g., toBeVisible({ timeout: Timeout.MEDIUM })). For a PR aimed at deflaking, this could introduce a new flake source.

Fix in CursorΒ Fix in Web

Reviewed by Cursor Bugbot for commit c2d660a. Configure here.

@wwwillchen
Copy link
Copy Markdown
Collaborator Author

@BugBot run

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

βœ… Bugbot reviewed your changes and found no new issues!

1 issue from previous review remains unresolved.

Fix All in Cursor

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit 032be8e. Configure here.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

πŸ’‘ Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 032be8e8af

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with πŸ‘.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

await expect(async () => {
await chatInput.click();
await chatInput.fill(prompt);
await expect(chatInput).toContainText(prompt);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Drop raw-input equality check for mention prompts

sendPrompt() now gates submission on expect(chatInput).toContainText(prompt), but the Lexical editor rewrites internal mention formats to display text (e.g. @app:name becomes @name, similarly for other mention types) before render. In flows that use mention syntax (for example po.sendPrompt("[dump] @app:minimal-with-ai-rules hi")), this assertion never becomes true, so the helper retries until timeout and never clicks Send despite a valid prompt.

Useful? React with πŸ‘Β / πŸ‘Ž.

Copy link
Copy Markdown
Contributor

@dyad-assistant dyad-assistant bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multi-agent review: 4 new issue(s) found (3 MEDIUM, 1 LOW).

Comment thread rules/e2e-testing.md

- **Clearing input**: `fill("")` doesn't reliably clear Lexical. Use keyboard shortcuts instead: `Meta+a` then `Backspace`.
- **Timing issues**: Lexical may need time to update its internal state. Use `toPass()` with retries for resilient tests.
- **Avoid locator drift**: When both home/chat inputs may exist, scope the editor locator to the specific container (for example `chat-input-container`) and reuse one locator instance for click/fill/assertions.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟑 MEDIUM | documentation-code-mismatch

New rule contradicts sendPrompt() implementation and PR description

This new rule instructs readers to "scope the editor locator to the specific container (for example chat-input-container) and reuse one locator instance". But the actual sendPrompt() in this PR uses this.getChatInput(), which is a global unscoped locator:

getChatInput() {
  return this.page.locator(
    '[data-lexical-editor="true"][aria-placeholder^="Ask Dyad to build"]',
  );
}

That means the same placeholder under home-chat-input-container also matches. The PR description compounds this β€” it says "stabilize sendPrompt() by scoping to the chat-input container" and "add a short settle delay after creating a new chat in chat_tabs.spec.ts", but neither of those changes is in the current diff (the scoping was reverted after breaking home-page tests; no sleep is present in chat_tabs.spec.ts).

Future contributors will follow this rule expecting it to describe sendPrompt(), then be confused when the code does the opposite. Either:

  • Update the rule to describe the actual pattern (retry-based toPass over a global locator, with container scoping reserved for tests that need to disambiguate), and update the PR description to match, or
  • Refactor sendPrompt() to pick getChatInputContainer() vs getHomeChatInputContainer() at runtime and scope the editor lookup inside it.

πŸ’‘ Suggestion: At minimum, fix the PR description so reviewers and git blame readers aren't misled about what changed.

await expect(chatInput).toBeVisible();
await expect(async () => {
await chatInput.click();
await chatInput.fill(prompt);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟑 MEDIUM | retry-correctness

Retry loop can silently produce accumulated text in the Lexical editor

Inside the toPass() callback, each retry unconditionally calls:

await chatInput.click();
await chatInput.fill(prompt);
await expect(chatInput).toContainText(prompt);
await expect(sendButton).toBeEnabled();

The newly-added rule in rules/e2e-testing.md already warns that fill() is unreliable on Lexical (fill("") doesn't clear). If a retry fires β€” e.g., because toBeEnabled() hadn't updated yet on the first pass β€” the second fill(prompt) may append to the existing content instead of replacing it, leaving the editor with "First chat unique message alphaFirst chat unique message alpha".

toContainText(prompt) is a substring check, so the assertion still passes on the accumulated text. The wrong prompt is then clicked through to the LLM and the test quietly reports success. This is the classic "deflake masks a real bug" failure mode β€” the retry turns a loud flake into a silent correctness bug.

πŸ’‘ Suggestion: Use toHaveText(prompt) instead of toContainText(prompt) so accumulation forces another retry, or explicitly clear the editor (ControlOrMeta+a / Backspace, as clearChatInput() does) at the top of each retry attempt before calling fill().

test("tabs appear after navigating between chats", async ({ po }) => {
await po.setUp({ autoApprove: true });
await po.importApp("minimal");
await po.importApp("minimal-with-ai-rules");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟑 MEDIUM | unexplained-change

Fixture swap from minimal β†’ minimal-with-ai-rules is not explained

All 6 tests in this file switched their fixture. The only difference between the two fixtures is the presence of AI_RULES.md:

$ diff -rq e2e-tests/fixtures/import-app/minimal e2e-tests/fixtures/import-app/minimal-with-ai-rules
Only in minimal-with-ai-rules: AI_RULES.md

~15 other spec files still use minimal, so this is now an outlier, and the PR description doesn't mention the switch at all. Two problems:

  1. Coverage drift: the tab-UI paths are no longer exercised against the baseline minimal fixture. If tab behavior regresses when AI rules are absent (different banners, setup flow, focus stealing), none of these tests will catch it.
  2. Cargo-culting risk: a future maintainer won't know whether the swap was load-bearing for deflaking, or an accidental change that should be reverted.

πŸ’‘ Suggestion: If the swap is needed to avoid a focus/race condition caused by the no-AI-rules onboarding path, call that out in the PR description and add a one-line // comment next to the first importApp("minimal-with-ai-rules") explaining why. Otherwise, revert to minimal to stay consistent with the rest of the e2e suite.

await expect(sendButton).toBeEnabled();
}).toPass({ timeout: Timeout.MEDIUM });

await sendButton.click();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟒 LOW | race-condition

sendButton.click() is outside the retry loop

Once toPass() resolves, the final click is a single-shot call:

}).toPass({ timeout: Timeout.MEDIUM });

await sendButton.click();

If the send button flips back to disabled between the final retry check and the actual click (e.g., React commits a state change while re-validating the Lexical content), the click will either miss or be silently swallowed β€” putting us right back in the flake class this PR is trying to fix.

πŸ’‘ Suggestion: After sendButton.click(), assert that the chat is actually in a sending state (spinner visible, send button disabled, or the message appears in the history) so any lost click surfaces as a loud failure rather than a timeout further downstream in waitForChatCompletion().

@dyad-assistant
Copy link
Copy Markdown
Contributor

πŸ” Dyadbot Code Review Summary

Verdict: πŸ€” NOT SURE - Potential issues

Reviewed by 3 independent agents: Correctness Expert, Code Health Expert, UX Wizard. Findings validated against existing reviewer comments and the current state of the diff on 032be8e.

Issues Summary

Severity File Issue
🟑 MEDIUM rules/e2e-testing.md:64 New rule + PR description contradict the sendPrompt() implementation
🟑 MEDIUM e2e-tests/helpers/page-objects/components/ChatActions.ts:97 Retry loop can silently accumulate text in the Lexical editor
🟑 MEDIUM e2e-tests/chat_tabs.spec.ts:6 Fixture swap minimal β†’ minimal-with-ai-rules is unexplained (6 tests)
🟒 LOW e2e-tests/helpers/page-objects/components/ChatActions.ts:102 Final sendButton.click() is outside the retry loop

Key concern: PR description is stale

The PR description currently says:

  1. "stabilize sendPrompt() by scoping to the chat-input container"
  2. "assert prompt text is present and send button is enabled before clicking send"
  3. "add a short settle delay after creating a new chat in chat_tabs.spec.ts"

Only (2) is actually in the diff. The scoping in (1) was reverted after it broke home-page tests (sendPrompt() now uses the unscoped global getChatInput()), and no sleep is present in chat_tabs.spec.ts. The newly-added rule in rules/e2e-testing.md also describes container scoping that sendPrompt() does not do.

This matters because the rule is prescriptive for future contributors, and a stale PR description makes the git blame trail unreliable. Please reconcile the description, rule, and code before merging.

🚫 Dropped false positives / merged duplicates (4 items)
  • toContainText vs toHaveText β€” Merged into the retry-accumulation MEDIUM issue (they're the same root cause).
  • sleep(2_000) lacks a WHY comment β€” Dropped: the sleep is not in the current diff, only in earlier commits that existing reviewers flagged.
  • Retry loop masks real user-visible bugs (missed first-attempt regressions) β€” Dropped: the project's own rules/e2e-testing.md explicitly endorses toPass() retries for Lexical's async state updates, so this is a judgment call rather than an actionable bug.
  • toContainText(prompt) fragile for regex-special characters β€” Dropped: Playwright's toContainText with a string argument treats it as a literal substring, not a regex, so bracket characters in prompts like [dump] are safe.

Generated by Dyadbot multi-agent code review

@github-actions
Copy link
Copy Markdown
Contributor

🎭 Playwright Test Results

❌ Some tests failed

OS Passed Failed Flaky Skipped
🍎 macOS 275 5 5 6

Summary: 275 passed, 5 failed, 5 flaky, 6 skipped

Failed Tests

🍎 macOS

  • local_agent_advanced.spec.ts > local-agent - mention apps
    • Error: expect(locator).toContainText(expected) failed
  • mention_app.spec.ts > mention app (without pro)
    • Error: expect(locator).toContainText(expected) failed
  • mention_app.spec.ts > mention app (with pro)
    • Error: expect(locator).toContainText(expected) failed
  • per_chat_input.spec.ts > input preserved when switching back and forth multiple times
    • Error: expect(locator).toContainText(expected) failed
  • smart_context_deep.spec.ts > smart context deep - mention app should fallback to balanced
    • Error: expect(locator).toContainText(expected) failed

πŸ“‹ Re-run Failing Tests (macOS)

Copy and paste to re-run all failing spec files locally:

npm run e2e \
  e2e-tests/local_agent_advanced.spec.ts \
  e2e-tests/mention_app.spec.ts \
  e2e-tests/per_chat_input.spec.ts \
  e2e-tests/smart_context_deep.spec.ts

⚠️ Flaky Tests

🍎 macOS

  • chat_tabs.spec.ts > tabs appear after navigating between chats (passed after 1 retry)
  • chat_tabs.spec.ts > clicking a tab switches to that chat (passed after 1 retry)
  • cloud_sandbox.spec.ts > cloud sandbox undo restores the remote snapshot (passed after 1 retry)
  • local_agent_code_search.spec.ts > local-agent - code search (passed after 1 retry)
  • setup_flow.spec.ts > Setup Flow > setup banner shows correct state when node.js is installed (passed after 1 retry)

πŸ“Š View full report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-human:review-issue ai agent flagged an issue that requires human review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants