docs: add search-replace plan by RyanGroch · Pull Request #3155 · dyad-sh/dyad

RyanGroch · 2026-04-07T14:03:34Z

Like most of the files in the plans folder, this file is AI-generated. However, I've checked it over and revised it, and to me it appears sound.

Essentially, it is a plan to test the search_replace tool in order to make sure that it's reliable, and to fix it if it's not.

wwwillchen · 2026-04-07T14:03:46Z

@BugBot run

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no bugs or issues to report.

cubic-dev-ai

No issues found across 1 file

Confidence score: 5/5

Automated review surfaced no issues in the provided summaries.
No files require special attention.

wwwillchen · 2026-04-07T17:24:01Z

@BugBot run

wwwillchen · 2026-04-07T17:56:01Z

@BugBot run

wwwillchen

I think overall a pretty good plan with a few suggestions:

i'd just use Dyad Engine + Dyad Pro key, this is going to be much easier than managing 3 API keys and it'll be using the same models. (Dyad Engine basically proxies all the models)
I'd have more eval cases: like 10-12 cases and include more complex cases such as (refactor giant 700-line react component into 3 smaller components, etc.)
Just because a search-replace applies without error doesn't mean it's correct, you'll need to either spot check them or use another model (i'd probably use GPT 5.4) to judge the output - basically, feed in the prompt + original file + output file and say: does the output file look correct given the prompt + original file?

wwwillchen · 2026-04-09T04:26:56Z

@BugBot run

github-actions · 2026-04-09T05:02:11Z

🎭 Playwright Test Results

❌ Some tests failed

OS	Passed	Failed	Flaky	Skipped
🍎 macOS	407	4	5	129
🪟 Windows	404	8	2	129

Summary: 811 passed, 12 failed, 7 flaky, 258 skipped

Failed Tests

🍎 macOS

chat_input.spec.ts > send button disabled during pending proposal
- Error: expect(locator).toBeVisible() failed
queued_message.spec.ts > editing queued message restores attachments and selected components
- Error: expect(locator).toBeVisible() failed
queued_message.spec.ts > canceling queued message edit clears restored components
- Error: expect(locator).toBeVisible() failed
setup_flow.spec.ts > Setup Flow > node.js install flow
- TimeoutError: locator.click: Timeout 30000ms exceeded.

🪟 Windows

context_manage.spec.ts > manage context - smart context
- Error: expect(locator).toMatchAriaSnapshot(expected) failed
context_manage.spec.ts > manage context - smart context - auto-includes only
- Error: expect(locator).toMatchAriaSnapshot(expected) failed
context_manage.spec.ts > manage context - exclude paths
- Error: expect(locator).toMatchAriaSnapshot(expected) failed
context_manage.spec.ts > manage context - exclude paths with smart context
- Error: expect(locator).toMatchAriaSnapshot(expected) failed
github.spec.ts > create and sync to new repo
- Error: expect(locator).toHaveClass(expected) failed
github.spec.ts > create and sync to existing repo
- Error: expect(locator).toMatchAriaSnapshot(expected) failed
github.spec.ts > create and sync to existing repo - custom branch
- Error: expect(locator).toMatchAriaSnapshot(expected) failed
setup_flow.spec.ts > Setup Flow > node.js install flow
- TimeoutError: locator.dispatchEvent: Timeout 30000ms exceeded.

📋 Re-run Failing Tests (macOS)

Copy and paste to re-run all failing spec files locally:

npm run e2e \
  e2e-tests/chat_input.spec.ts \
  e2e-tests/queued_message.spec.ts \
  e2e-tests/setup_flow.spec.ts

⚠️ Flaky Tests

🍎 macOS

approve.spec.ts > write to index, approve, check preview (passed after 1 retry)
context_limit_banner.spec.ts > context limit banner shows 'running out' when near context limit (passed after 1 retry)
context_manage.spec.ts > manage context - smart context (passed after 1 retry)
logs_server.spec.ts > system messages UI shows server logs with correct type (passed after 1 retry)
setup_flow.spec.ts > Setup Flow > setup banner shows correct state when node.js is installed (passed after 1 retry)

🪟 Windows

context_manage.spec.ts > manage context - default (passed after 1 retry)
setup_flow.spec.ts > Setup Flow > setup banner shows correct state when node.js is installed (passed after 1 retry)

📊 View full report

wwwillchen · 2026-04-09T05:08:38Z

@BugBot run

wwwillchen · 2026-04-09T15:20:52Z

@BugBot run

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e1f2d72fd5

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

wwwillchen · 2026-04-09T15:38:38Z

@BugBot run

RyanGroch · 2026-04-09T17:15:07Z

Re-requesting a review because I've made two large changes at this point:

I changed the plan to use Dyad Pro instead of individual API keys.
There are now two kinds of cases: "exact match" cases (checked by a simple diff) and "judge" cases which use GPT 5.4 to judge the output. There are 7 exact match cases and 5 judge cases. In general, judge cases are the more complex ones (refactor large component, etc.).

I've also started implementing this, so I'll likely open another PR soon. I can still address any other changes that are needed though.

docs: add search-replace plan

2f1a1b5

RyanGroch requested a review from a team April 7, 2026 14:03

RyanGroch temporarily deployed to ai-bots April 7, 2026 14:03 — with GitHub Actions Inactive

RyanGroch had a problem deploying to ai-bots April 7, 2026 14:03 — with GitHub Actions Failure

devin-ai-integration bot reviewed Apr 7, 2026

View reviewed changes

This comment was marked as resolved.

Sign in to view

cubic-dev-ai bot reviewed Apr 7, 2026

View reviewed changes

github-actions bot added the needs-human:review-issue ai agent flagged an issue that requires human review label Apr 7, 2026

fix: escape strings

768a06a

RyanGroch temporarily deployed to ai-bots April 7, 2026 17:23 — with GitHub Actions Inactive

RyanGroch had a problem deploying to ai-bots April 7, 2026 17:23 — with GitHub Actions Failure

This comment was marked as resolved.

Sign in to view

fix: use production temperatures

cabc656

RyanGroch temporarily deployed to ai-bots April 7, 2026 17:55 — with GitHub Actions Inactive

RyanGroch had a problem deploying to ai-bots April 7, 2026 17:55 — with GitHub Actions Failure

wwwillchen approved these changes Apr 7, 2026

View reviewed changes

Comment thread plans/search-replace-eval.md Outdated

Comment thread plans/search-replace-eval.md Outdated

RyanGroch added 5 commits April 8, 2026 09:55

fix: use Dyad pro instead of individual API keys

b2cb8b5

adds judge-verified cases to plan

98f80fb

makes tests concurrent

2f35a74

Use stopWhen instead of maxSteps

2219ae5

fix Dyad Pro streaming logic

dd96a49

RyanGroch had a problem deploying to ai-bots April 9, 2026 04:26 — with GitHub Actions Failure

RyanGroch temporarily deployed to ai-bots April 9, 2026 04:26 — with GitHub Actions Inactive

This comment was marked as resolved.

Sign in to view

asserts file correctness and adds gpt 5.4 constant

65b19f5

RyanGroch temporarily deployed to ai-bots April 9, 2026 05:08 — with GitHub Actions Inactive

RyanGroch had a problem deploying to ai-bots April 9, 2026 05:08 — with GitHub Actions Failure

This comment was marked as resolved.

Sign in to view

enforce single tool call in exact-match cases

e1f2d72

RyanGroch had a problem deploying to ai-bots April 9, 2026 15:20 — with GitHub Actions Failure

RyanGroch temporarily deployed to ai-bots April 9, 2026 15:20 — with GitHub Actions Inactive

chatgpt-codex-connector bot reviewed Apr 9, 2026

View reviewed changes

Comment thread plans/search-replace-eval.md

Comment thread plans/search-replace-eval.md

RyanGroch added 2 commits April 9, 2026 10:33

enforce correct file in judge mode; enforce no file rewrites

43ed3aa

define assertNotFullFileRewrite

2cfa0c8

RyanGroch had a problem deploying to ai-bots April 9, 2026 15:38 — with GitHub Actions Failure

RyanGroch temporarily deployed to ai-bots April 9, 2026 15:38 — with GitHub Actions Inactive

RyanGroch requested a review from wwwillchen April 9, 2026 17:24

Conversation

RyanGroch commented Apr 7, 2026 • edited by devin-ai-integration bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wwwillchen commented Apr 7, 2026

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

This comment was marked as resolved.

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

wwwillchen commented Apr 7, 2026

Uh oh!

This comment was marked as resolved.

Uh oh!

wwwillchen commented Apr 7, 2026

Uh oh!

wwwillchen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

wwwillchen commented Apr 9, 2026

Uh oh!

This comment was marked as resolved.

Uh oh!

github-actions bot commented Apr 9, 2026

🎭 Playwright Test Results

❌ Some tests failed

Failed Tests

🍎 macOS

🪟 Windows

📋 Re-run Failing Tests (macOS)

⚠️ Flaky Tests

🍎 macOS

🪟 Windows

Uh oh!

wwwillchen commented Apr 9, 2026

Uh oh!

This comment was marked as resolved.

Uh oh!

wwwillchen commented Apr 9, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

wwwillchen commented Apr 9, 2026

Uh oh!

RyanGroch commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

RyanGroch commented Apr 7, 2026 •

edited by devin-ai-integration bot

Loading

RyanGroch commented Apr 9, 2026 •

edited

Loading