Deflake chat input prompt entry in E2E#3186
Conversation
Stabilize sendPrompt by retrying click/fill assertions against a single scoped editor locator and add a short settle wait in chat tab navigation to avoid intermittent prompt entry races on slower machines. Made-with: Cursor
|
@BugBot run |
There was a problem hiding this comment.
Code Review
This pull request improves the reliability of chat interactions in E2E tests by refactoring the prompt-sending logic to include explicit visibility and state checks. However, a fixed sleep was introduced in the chat tabs test, which should be replaced with a more robust synchronization mechanism to avoid potential flakiness.
|
|
||
| // Chat 2 | ||
| await po.chatActions.clickNewChat(); | ||
| await po.sleep(2_000); |
There was a problem hiding this comment.
While fixed timeouts can be used in sequential and predictable flows, they are generally discouraged in E2E tests as they can lead to flakiness. Following the principle of preferring more robust synchronization mechanisms, it is better to wait for a specific condition, such as waiting for content from the previous chat to disappear, to ensure the test is reliable across different environments.
| await po.sleep(2_000); | |
| await expect(po.page.getByText("[dump] build a todo app")).not.toBeVisible(); |
References
- In prototypes with sequential and predictable flows, setTimeout can be used for state synchronization. However, prefer more robust mechanisms like state machines for production code.
Made-with: Cursor
|
@BugBot run |
There was a problem hiding this comment.
π‘ Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1a60832fad
βΉοΈ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with π.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| const chatInput = this.getChatInputContainer().locator( | ||
| '[data-lexical-editor="true"]', | ||
| ); |
There was a problem hiding this comment.
Support home chat container in sendPrompt
sendPrompt() now scopes the editor lookup to chat-input-container, but many E2E flows call this helper from the Apps/Home screen where only home-chat-input-container is rendered (e.g., PageObject.setUp() navigates to Apps before first prompt, and home_chat_existing_app.spec.ts sends from home). In that context expect(chatInput).toBeVisible() will time out, so the helper can no longer send the initial prompt to create/select an app.
Useful? React with πΒ / π.
There was a problem hiding this comment.
Pull request overview
This PR aims to reduce Playwright E2E flakes around entering prompts into the Lexical chat input and switching between chats/tabs.
Changes:
- Updated
ChatActions.sendPrompt()to scope tochat-input-containerand add retry-based click/fill verification before sending. - Added a 2s settle delay after creating a new chat in
chat_tabs.spec.tsto reduce intermittent timing issues.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| e2e-tests/helpers/page-objects/components/ChatActions.ts | Adds retry-based prompt entry + pre-send assertions, and scopes the Lexical editor to the chat input container. |
| e2e-tests/chat_tabs.spec.ts | Adds a fixed delay after clickNewChat() to reduce tab-switch timing flakes. |
π‘ Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const chatInput = this.getChatInputContainer().locator( | ||
| '[data-lexical-editor="true"]', | ||
| ); | ||
| const sendButton = this.page.getByRole("button", { name: "Send message" }); | ||
|
|
There was a problem hiding this comment.
sendPrompt() now hard-scopes the Lexical editor to chat-input-container. That will fail on the Apps/Home page where the prompt input lives under home-chat-input-container (see HomeChatInput using data-testid="home-chat-input-container"). Consider selecting the visible container (chat vs home) at runtime, or adding an explicit location option and scoping both the editor and the Send button to that container to avoid targeting the wrong UI.
| await expect(chatInput).toContainText(prompt); | ||
| await expect(sendButton).toBeEnabled(); |
There was a problem hiding this comment.
Inside expect(...).toPass(), the nested Playwright expect(chatInput).toContainText(...) and expect(sendButton).toBeEnabled() each have their own auto-wait timeouts (defaults), which can block a single attempt for seconds and largely defeats the retry loop. Prefer asserting via textContent() / isEnabled() inside the callback, or set explicit short/zero timeouts on the nested expect calls so toPass can retry quickly.
| await expect(chatInput).toContainText(prompt); | |
| await expect(sendButton).toBeEnabled(); | |
| const chatInputText = await chatInput.textContent(); | |
| expect(chatInputText ?? "").toContain(prompt); | |
| expect(await sendButton.isEnabled()).toBe(true); |
|
|
||
| // Chat 2 | ||
| await po.chatActions.clickNewChat(); | ||
| await po.sleep(2_000); |
There was a problem hiding this comment.
The fixed sleep(2_000) will slow the suite and can still be flaky on very slow/fast machines. Prefer waiting on a concrete post-clickNewChat() condition (e.g., chat id changes, input clears/becomes enabled, or the new tab becomes active) using expect(...).toPass()/locator assertions instead of an unconditional delay.
|
@BugBot run |
There was a problem hiding this comment.
2 issues found across 2 files
Confidence score: 3/5
- Some risk due to a concrete user-impacting change:
sendPrompt()ine2e-tests/helpers/page-objects/components/ChatActions.tsnow scopes the editor tochat-input-container, which can break Home/Apps page flows that send prompts from outside that container. - The nested assertions inside
toPass()ine2e-tests/helpers/page-objects/components/ChatActions.tscan compound timeouts, increasing E2E flakiness and slow retries, but this is not a production runtime risk. - Pay close attention to
e2e-tests/helpers/page-objects/components/ChatActions.ts- potential E2E flow breakage and compounded assertion timeouts.
Prompt for AI agents (unresolved issues)
Check if these issues are valid β if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="e2e-tests/helpers/page-objects/components/ChatActions.ts">
<violation number="1" location="e2e-tests/helpers/page-objects/components/ChatActions.ts:91">
P1: `sendPrompt()` now scopes the Lexical editor lookup to `getChatInputContainer()` (i.e. `chat-input-container`), but E2E flows that send prompts from the Home/Apps page (e.g. `home_chat_existing_app.spec.ts`, initial `setUp()` prompts) target a different container (`home-chat-input-container`). The `expect(chatInput).toBeVisible()` call will time out in those contexts. Consider selecting the visible container at runtime or accepting an explicit scope parameter.</violation>
<violation number="2" location="e2e-tests/helpers/page-objects/components/ChatActions.ts:100">
P2: The nested web-first assertions (`toContainText`, `toBeEnabled`) inside the `toPass()` callback each carry their own default auto-wait timeout (~5s each). A single retry attempt can block for up to 10s on these assertions, largely defeating the retry loop. Use instant checks (e.g., `await chatInput.textContent()` + synchronous `expect`, and `await sendButton.isEnabled()`) or pass `{ timeout: 0 }` to the nested expects so `toPass` retries quickly.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| await expect(chatInput).toContainText(prompt); | ||
| await expect(sendButton).toBeEnabled(); |
There was a problem hiding this comment.
P2: The nested web-first assertions (toContainText, toBeEnabled) inside the toPass() callback each carry their own default auto-wait timeout (~5s each). A single retry attempt can block for up to 10s on these assertions, largely defeating the retry loop. Use instant checks (e.g., await chatInput.textContent() + synchronous expect, and await sendButton.isEnabled()) or pass { timeout: 0 } to the nested expects so toPass retries quickly.
Prompt for AI agents
Check if this issue is valid β if so, understand the root cause and fix it. At e2e-tests/helpers/page-objects/components/ChatActions.ts, line 100:
<comment>The nested web-first assertions (`toContainText`, `toBeEnabled`) inside the `toPass()` callback each carry their own default auto-wait timeout (~5s each). A single retry attempt can block for up to 10s on these assertions, largely defeating the retry loop. Use instant checks (e.g., `await chatInput.textContent()` + synchronous `expect`, and `await sendButton.isEnabled()`) or pass `{ timeout: 0 }` to the nested expects so `toPass` retries quickly.</comment>
<file context>
@@ -88,9 +88,20 @@ export class ChatActions {
+ await expect(async () => {
+ await chatInput.click();
+ await chatInput.fill(prompt);
+ await expect(chatInput).toContainText(prompt);
+ await expect(sendButton).toBeEnabled();
+ }).toPass({ timeout: Timeout.SHORT });
</file context>
| await expect(chatInput).toContainText(prompt); | |
| await expect(sendButton).toBeEnabled(); | |
| const chatInputText = await chatInput.textContent(); | |
| expect(chatInputText ?? "").toContain(prompt); | |
| expect(await sendButton.isEnabled()).toBe(true); |
|
@BugBot run |
There was a problem hiding this comment.
π‘ Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c2d660a9d9
βΉοΈ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with π.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| const chatInput = this.getChatInput(); | ||
| const sendButton = this.page.getByRole("button", { name: "Send message" }); | ||
|
|
||
| await expect(chatInput).toBeVisible(); |
There was a problem hiding this comment.
Apply MEDIUM timeout to initial chat-input visibility wait
The new expect(chatInput).toBeVisible() pre-check uses Playwrightβs default assertion timeout (~5s), so on slower CI runs after importApp() it can fail before the toPass({ timeout: Timeout.MEDIUM }) retry logic even starts. Before this commit, click()/fill() used the longer action/test timeout budget, so this change can reintroduce prompt-entry flakes in the same paths this helper is meant to stabilize. Use timeout: Timeout.MEDIUM for this visibility check (or move it inside the retry block) so waits are consistent.
Useful? React with πΒ / π.
| const chatInput = this.getChatInput(); | ||
| const sendButton = this.page.getByRole("button", { name: "Send message" }); | ||
|
|
||
| await expect(chatInput).toBeVisible(); |
There was a problem hiding this comment.
Visibility assertion uses short default timeout before retry
Low Severity
The new await expect(chatInput).toBeVisible() assertion runs before the toPass retry block and uses Playwright's default 5-second assertion timeout. The subsequent retry block uses Timeout.MEDIUM (15β30s on CI). On slow CI machines, this creates a bottleneck: if the chat input takes more than 5 seconds to render (e.g., after clickNewChat() triggers a page transition), the method throws before the more resilient retry block ever runs. The old code had no such gateβclick() would wait using the much longer action/test timeout. Other assertions in this codebase pass explicit timeouts when longer waits are needed (e.g., toBeVisible({ timeout: Timeout.MEDIUM })). For a PR aimed at deflaking, this could introduce a new flake source.
Reviewed by Cursor Bugbot for commit c2d660a. Configure here.
|
@BugBot run |
There was a problem hiding this comment.
β Bugbot reviewed your changes and found no new issues!
1 issue from previous review remains unresolved.
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit 032be8e. Configure here.
There was a problem hiding this comment.
π‘ Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 032be8e8af
βΉοΈ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with π.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| await expect(async () => { | ||
| await chatInput.click(); | ||
| await chatInput.fill(prompt); | ||
| await expect(chatInput).toContainText(prompt); |
There was a problem hiding this comment.
Drop raw-input equality check for mention prompts
sendPrompt() now gates submission on expect(chatInput).toContainText(prompt), but the Lexical editor rewrites internal mention formats to display text (e.g. @app:name becomes @name, similarly for other mention types) before render. In flows that use mention syntax (for example po.sendPrompt("[dump] @app:minimal-with-ai-rules hi")), this assertion never becomes true, so the helper retries until timeout and never clicks Send despite a valid prompt.
Useful? React with πΒ / π.
|
|
||
| - **Clearing input**: `fill("")` doesn't reliably clear Lexical. Use keyboard shortcuts instead: `Meta+a` then `Backspace`. | ||
| - **Timing issues**: Lexical may need time to update its internal state. Use `toPass()` with retries for resilient tests. | ||
| - **Avoid locator drift**: When both home/chat inputs may exist, scope the editor locator to the specific container (for example `chat-input-container`) and reuse one locator instance for click/fill/assertions. |
There was a problem hiding this comment.
π‘ MEDIUM | documentation-code-mismatch
New rule contradicts sendPrompt() implementation and PR description
This new rule instructs readers to "scope the editor locator to the specific container (for example chat-input-container) and reuse one locator instance". But the actual sendPrompt() in this PR uses this.getChatInput(), which is a global unscoped locator:
getChatInput() {
return this.page.locator(
'[data-lexical-editor="true"][aria-placeholder^="Ask Dyad to build"]',
);
}That means the same placeholder under home-chat-input-container also matches. The PR description compounds this β it says "stabilize sendPrompt() by scoping to the chat-input container" and "add a short settle delay after creating a new chat in chat_tabs.spec.ts", but neither of those changes is in the current diff (the scoping was reverted after breaking home-page tests; no sleep is present in chat_tabs.spec.ts).
Future contributors will follow this rule expecting it to describe sendPrompt(), then be confused when the code does the opposite. Either:
- Update the rule to describe the actual pattern (retry-based
toPassover a global locator, with container scoping reserved for tests that need to disambiguate), and update the PR description to match, or - Refactor
sendPrompt()to pickgetChatInputContainer()vsgetHomeChatInputContainer()at runtime and scope the editor lookup inside it.
π‘ Suggestion: At minimum, fix the PR description so reviewers and git blame readers aren't misled about what changed.
| await expect(chatInput).toBeVisible(); | ||
| await expect(async () => { | ||
| await chatInput.click(); | ||
| await chatInput.fill(prompt); |
There was a problem hiding this comment.
π‘ MEDIUM | retry-correctness
Retry loop can silently produce accumulated text in the Lexical editor
Inside the toPass() callback, each retry unconditionally calls:
await chatInput.click();
await chatInput.fill(prompt);
await expect(chatInput).toContainText(prompt);
await expect(sendButton).toBeEnabled();The newly-added rule in rules/e2e-testing.md already warns that fill() is unreliable on Lexical (fill("") doesn't clear). If a retry fires β e.g., because toBeEnabled() hadn't updated yet on the first pass β the second fill(prompt) may append to the existing content instead of replacing it, leaving the editor with "First chat unique message alphaFirst chat unique message alpha".
toContainText(prompt) is a substring check, so the assertion still passes on the accumulated text. The wrong prompt is then clicked through to the LLM and the test quietly reports success. This is the classic "deflake masks a real bug" failure mode β the retry turns a loud flake into a silent correctness bug.
π‘ Suggestion: Use toHaveText(prompt) instead of toContainText(prompt) so accumulation forces another retry, or explicitly clear the editor (ControlOrMeta+a / Backspace, as clearChatInput() does) at the top of each retry attempt before calling fill().
| test("tabs appear after navigating between chats", async ({ po }) => { | ||
| await po.setUp({ autoApprove: true }); | ||
| await po.importApp("minimal"); | ||
| await po.importApp("minimal-with-ai-rules"); |
There was a problem hiding this comment.
π‘ MEDIUM | unexplained-change
Fixture swap from minimal β minimal-with-ai-rules is not explained
All 6 tests in this file switched their fixture. The only difference between the two fixtures is the presence of AI_RULES.md:
$ diff -rq e2e-tests/fixtures/import-app/minimal e2e-tests/fixtures/import-app/minimal-with-ai-rules
Only in minimal-with-ai-rules: AI_RULES.md
~15 other spec files still use minimal, so this is now an outlier, and the PR description doesn't mention the switch at all. Two problems:
- Coverage drift: the tab-UI paths are no longer exercised against the baseline
minimalfixture. If tab behavior regresses when AI rules are absent (different banners, setup flow, focus stealing), none of these tests will catch it. - Cargo-culting risk: a future maintainer won't know whether the swap was load-bearing for deflaking, or an accidental change that should be reverted.
π‘ Suggestion: If the swap is needed to avoid a focus/race condition caused by the no-AI-rules onboarding path, call that out in the PR description and add a one-line // comment next to the first importApp("minimal-with-ai-rules") explaining why. Otherwise, revert to minimal to stay consistent with the rest of the e2e suite.
| await expect(sendButton).toBeEnabled(); | ||
| }).toPass({ timeout: Timeout.MEDIUM }); | ||
|
|
||
| await sendButton.click(); |
There was a problem hiding this comment.
π’ LOW | race-condition
sendButton.click() is outside the retry loop
Once toPass() resolves, the final click is a single-shot call:
}).toPass({ timeout: Timeout.MEDIUM });
await sendButton.click();If the send button flips back to disabled between the final retry check and the actual click (e.g., React commits a state change while re-validating the Lexical content), the click will either miss or be silently swallowed β putting us right back in the flake class this PR is trying to fix.
π‘ Suggestion: After sendButton.click(), assert that the chat is actually in a sending state (spinner visible, send button disabled, or the message appears in the history) so any lost click surfaces as a loud failure rather than a timeout further downstream in waitForChatCompletion().
π Dyadbot Code Review SummaryVerdict: π€ NOT SURE - Potential issues Reviewed by 3 independent agents: Correctness Expert, Code Health Expert, UX Wizard. Findings validated against existing reviewer comments and the current state of the diff on Issues Summary
Key concern: PR description is staleThe PR description currently says:
Only (2) is actually in the diff. The scoping in (1) was reverted after it broke home-page tests ( This matters because the rule is prescriptive for future contributors, and a stale PR description makes the π« Dropped false positives / merged duplicates (4 items)
Generated by Dyadbot multi-agent code review |
π Playwright Test Resultsβ Some tests failed
Summary: 275 passed, 5 failed, 5 flaky, 6 skipped Failed Testsπ macOS
π Re-run Failing Tests (macOS)Copy and paste to re-run all failing spec files locally: npm run e2e \
e2e-tests/local_agent_advanced.spec.ts \
e2e-tests/mention_app.spec.ts \
e2e-tests/per_chat_input.spec.ts \
e2e-tests/smart_context_deep.spec.ts
|


Summary
sendPrompt()by scoping to the chat-input container and using retry-based click/fill verification for Lexicalchat_tabs.spec.tsto reduce intermittent tab-switch timing flakesTest plan
π€ Generated with Claude Code
Made with Cursor