Add agent-driven e2e testing framework for desktop by lifeizhou-ap · Pull Request #8032 · block/goose

lifeizhou-ap · 2026-03-20T12:35:38Z

Summary

Created a new e2e testing framework for the Goose desktop app using agent-browser CLI, enabling test creation via natural language and deterministic replay in CI.

Features

create-e2e-test and e2e-app skills for creating e2e tests via natural language
debug_e2e_failures recipe for local/CI e2e failure diagnosis
Scripts to start, stop, and manage isolated Goose desktop instances
Parallel test execution with e2e-run-all.sh with logs, videos and failed test screenshot for debugging
e2e test recordings covering chat, extensions, recipes, and settings. The tests are json version of agent-browser CLI
e2e tests run automatically in CI via pr-smoke-test.yml and generated artifacts with logs, videos and failed test screenshot for debugging

Changes

ui/desktop/src/main.ts: E2E mode support — userData isolation, fixed window size, dialog auto-confirm, GOOSE_WORKING_DIR override
ui/desktop/tests/e2e-tests/scripts/: e2e-start.sh, e2e-stop.sh, e2e-run-all.sh, replay.sh
ui/desktop/tests/e2e-tests/recordings/: 9 batch test files
ui/desktop/tests/e2e-tests/scenarios/: Human-readable test scenarios
.agents/skills/create-e2e-test/, .agents/skills/e2e-app/: Skills for test authoring
workflow_recipes/debug_e2e_failures/: Recipe for local/CI e2e failure diagnosis
.github/workflows/pr-smoke-test.yml: e2e CI integration
ui/desktop/tests/e2e-tests/scripts/e2e-start.sh: Local dev config via ~/.config/goose/e2e.env

Note

If this pattern works well, we can remove the existing playwright e2e tests as they are already not actively maintained now

* main: (22 commits) feat: add gemini-acp provider, update docs on subscription models + improvements to codex (#8000) fix(openai): use Responses API for gpt-5.4 (#7982) Remove lead/worker provider (#7989) chore(release): release version 1.28.0 (#7991) Fix empty tool results from resource content (e.g. auto visualiser) (#7866) Separate SSE streaming from POST work submission (#7834) fix: include token usage in Databricks streaming responses (#7959) Optimize tool summarization (#7938) fix: overwrite the deprecated googledrive extension config (#7974) refactor: remove unnecessary Arc<Mutex> from tool execution pipeline (#7979) Revert message flush & test (#7966) docs: add Remote Access section with Telegram Gateway documentation (#7955) fix: update webmcp blog post metadata image URL (#7967) fix: clean up OAuth token cache on provider deletion (#7908) fix: hard-coded tool call id in code mode callback (#7939) Fix SSE parsers to accept optional space after data: prefix (#7929) docs: add GOOSE_INPUT_LIMIT to config-files.md (#7961) Add WebMCP for Beginners blog post (#7957) Fix download manager (#7933) Improve the formatting of tool calls, show thinking, treat Reasoning and Thinking as the same thing (sorry Kant) (#7626) ...

.github/workflows/pr-smoke-test.yml

* main: (37 commits) fix: handle reasoning content blocks in OpenAI-compat streaming parser (#8078) chore(acp): build native packages on latest mac (#8075) Display delegate sub agents logs in UI (#7519) Update tar version to avoid CVE-2026-33056 (#8073) refactor: consolidate duplicated dependencies into workspace (#8041) tui: set up for publishing via github actions (#8020) feat: feature-gate local inference dependencies (#7976) feat: ability to manage sub recipes in desktop ui (#6360) Tweak the release process: no more merge to main (#7994) fix: gemini models via databricks (#8042) feat(apps): Pass toolInfo to MCP Apps via hostContext (#7506) fix: remove configured marker when deleting oauth provider configuration (#7887) docs: add vmware-aiops MCP extension documentation (#8055) Show setup instructions for ACP providers in settings modal (#8065) deps: replace sigstore-verification with sigstore-verify to kill vulns (#8064) feat(acp): add session/set_config and stabilize list, delete and close (#7984) docs: Correct `gosoe` typo to `goose` (#8062) fix: use default provider and model when provider in session no longer exists (#8035) feat: add GOOSE_SHELL env var to configure preferred shell (#7909) fix(desktop): fullscreen header bar + always-visible close controls (#8033) ...

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b1fc2bf84b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-24T01:41:30Z

ui/desktop/tests/e2e-tests/scripts/replay.sh

+
+  STEP=$((i + 1))
+  echo "[$STEP/$TOTAL] agent-browser ${GLOBAL_ARGS[*]} ${ARGS[*]} --timeout $CMD_TIMEOUT_MS"
+  if ! timeout "$CMD_TIMEOUT" pnpm exec agent-browser "${GLOBAL_ARGS[@]}" "${ARGS[@]}"; then


Respect command-specific timeouts in replay

replay.sh hard-caps every agent-browser command to 10 seconds via timeout "$CMD_TIMEOUT" ..., so recorded steps that intentionally request longer waits (for example wait --timeout 60000/120000 in the new extension recordings) will still be killed at 10s. This makes long-running LLM/tool steps fail even when the recording specifies a larger timeout, causing deterministic false failures in the new E2E workflow.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-24T01:41:30Z

ui/desktop/src/main.ts

+  // In e2e test mode, auto-confirm dialogs so CDP-based tests can proceed
+  // without needing to interact with native OS dialogs.
+  if (process.env.ENABLE_PLAYWRIGHT) {
+    return { response: 1 };


Return the actual affirmative dialog response in E2E mode

In ENABLE_PLAYWRIGHT mode this handler always returns { response: 1 }, but several callers treat index 0 as the confirm action (e.g., KeyboardShortcutsSection conflict/reset dialogs). That means these dialogs are auto-cancelled instead of auto-confirmed during E2E runs, so test behavior diverges from user behavior and some flows become untestable.

Useful? React with 👍 / 👎.

The current E2E tests (eg: delete recipe) that trigger showMessageBox all use ['Cancel', 'Action'] button order, so response: 1 correctly auto-confirms them. The keyboard shortcut dialogs that use index 0 as confirm aren't used in any E2E test today. We'll revisit if that changes.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3bef76bc45

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-24T02:42:29Z

ui/desktop/tests/e2e-tests/scripts/e2e-start.sh

+    PORT=$((9300 + RANDOM % 100))
+    if ! lsof -i :"$PORT" &>/dev/null; then
+      echo "$PORT"


Reserve CDP ports atomically across parallel workers

pick_port does a check-then-use (lsof then echo) on a shared 9300–9399 range, but e2e-run-all.sh launches multiple tests concurrently (xargs -P "$WORKERS"). Two workers can select the same still-unbound port, which makes one session connect to the wrong Electron instance or fail nondeterministically once one process binds first. This creates cross-test contamination and flaky CI outcomes in the new parallel runner.

Useful? React with 👍 / 👎.

addressed this

chatgpt-codex-connector · 2026-03-24T02:42:29Z

ui/desktop/tests/e2e-tests/scripts/e2e-run-all.sh

+  if ! CDP_PORT=$(wait_for_app "$TEST_NAME"); then
+    local DURATION=$(( SECONDS - START_TIME ))
+    echo "[$TEST_NAME] FAIL — app did not start within 30s (${DURATION}s)"
+    echo "FAIL ${DURATION}s" > "$STATUS_DIR/$TEST_NAME"
+    return


Always stop the test session on startup timeout

When app startup does not become ready within 30s, run_one returns early before reaching the teardown call (e2e-stop.sh). Since the app was launched in a detached screen session, this path can leave orphaned Electron/goosed processes and stale session artifacts, which then interfere with subsequent recordings in the same run (e.g., port conflicts and residual state).

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2dcb343a45

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-24T05:32:27Z

ui/desktop/tests/e2e-tests/scripts/e2e-stop.sh

+  if [[ -n "$CDP_PORT" ]]; then
+    lsof -ti :"$CDP_PORT" 2>/dev/null | xargs kill -9 2>/dev/null || true
+    pkill -9 -f "remote-debugging-port=$CDP_PORT" 2>/dev/null || true


Stop goosed cleanly instead of SIGKILLing Electron

This teardown path force-kills the CDP listener with kill -9, which bypasses Electron shutdown hooks that normally call goosedResult.cleanup() in main.ts. Because goosed is spawned as goosed agent (no session path in argv), the later pattern kill in e2e-stop.sh does not reliably target it, so tests can leave orphan backend processes running and contaminate subsequent runs.

Useful? React with 👍 / 👎.

The primary shutdown is screen -X quit now, which sends SIGHUP and lets Electron run its shutdown hooks including goosedResult.cleanup(). The kill -9 lines after that are safety nets that only fire on processes that survived the graceful shutdown.

chatgpt-codex-connector · 2026-03-24T05:32:27Z

ui/desktop/tests/e2e-tests/scripts/e2e-run-all.sh

+trap 'echo ""; echo "Interrupted, stopping active E2E sessions..."; cleanup_and_exit 130' INT TERM
+
+# Run recordings in parallel with worker limit
+printf '%s\n' "${RECORDINGS[@]}" | xargs -P "$WORKERS" -I {} bash -c "run_one '{}' '$STATUS_DIR'" || true


Gate parallel workers when single-instance lock applies

The runner always executes recordings in parallel, but the app still enforces app.requestSingleInstanceLock() on non-macOS in main.ts. On Linux/Windows, only one Electron instance can remain alive, so additional workers will fail startup and be reported as test failures; this needs a platform guard (or forcing --workers 1) to avoid deterministic false negatives outside macOS.

Useful? React with 👍 / 👎.

we only supports macos at the moment

chatgpt-codex-connector · 2026-03-24T05:32:27Z

ui/desktop/tests/e2e-tests/scripts/e2e-run-all.sh

+  timeout "$TIMEOUT" bash "$SCRIPT_DIR/replay.sh" \
+    "$RECORDING" \


Respect scenario time budgets before test-level timeout

Each recording is wrapped in a fixed timeout "$TIMEOUT" (default 120s), but new recordings include waits up to 120s after earlier long waits (for example, extension-install-playwright.batch.json has 60s + 60s + 120s waits). This causes the harness to terminate otherwise valid long-running scenarios before their own declared timeouts elapse, producing avoidable TIMEOUT failures.

Useful? React with 👍 / 👎.

removed this timeout as we already have timeout when running agent browser command

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8281334588

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-24T06:37:53Z

ui/desktop/src/main.ts

  const goosedResult = await startGoosed({
    serverSecret,
-    dir: dir || os.homedir(),
+    dir: process.env.GOOSE_WORKING_DIR || dir || os.homedir(),


Preserve requested chat directory over env default

createChat now always prefers process.env.GOOSE_WORKING_DIR over the per-window dir argument, so any flow that passes a specific directory (e.g., deep links/new-window creation with dir) is silently ignored whenever that env var is set. This regresses directory-scoped behavior in those launches: users/tests can request a different folder, but startGoosed still starts in the env folder. Restricting the env override to initial defaulting (or to explicit E2E mode) avoids breaking dir-driven window creation.

Useful? React with 👍 / 👎.

It is fine now. GOOSE_WORKING_DIR never worked before this PR. This change is specifically for e2e test isolation. Will create a follow-up PR to consolidate the existing duplicated openDir logic so directory resolution is handled in one place.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0430864b99

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-24T11:41:45Z

ui/desktop/tests/e2e-tests/scripts/e2e-stop.sh

+
+pkill -9 -f "$BASE_DIR" 2>/dev/null || true
+pkill -9 -f 'agent-browser-chrome' 2>/dev/null || true
+pkill -9 -f 'agent-browser-darwin\|agent-browser-linux' 2>/dev/null || true


Use ERE alternation when killing agent-browser helpers

pkill patterns are interpreted as regular expressions (BSD/FreeBSD pgrep/pkill docs describe these as extended regex), so alternation should be |, not \|. With the current pattern, this command matches a literal pipe and does not match agent-browser-darwin or agent-browser-linux, leaving helper processes alive after teardown; those leftovers can interfere with later E2E runs on the same machine.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c0c9c54d77

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-24T11:56:18Z

ui/desktop/tests/e2e-tests/scripts/e2e-stop.sh

+    rm -f "$BASE_DIR/.port-locks/$CDP_PORT"
+  fi
+
+  pkill -9 -f "$SESSION_DIR" 2>/dev/null || true


Kill goosed explicitly for per-session teardown

This teardown path does not reliably terminate the backend process after force-killing Electron. stop_session only pattern-kills by "$SESSION_DIR", but startGoosed launches goosed as goosed agent with the session path only in cwd (not argv), so that match can miss; and in the normal runner path we call e2e-stop.sh "$TEST_NAME", which exits before the global fallback sweeps at the bottom of the script. Fresh evidence: ui/desktop/src/goosed.ts uses spawnArgs = ['agent'] and cwd: workingDir, so argv-based pkill -f "$SESSION_DIR" is not a reliable cleanup mechanism. When this miss happens, orphan goosed processes can leak into later tests and cause cross-test contamination/flakes.

Useful? React with 👍 / 👎.

screen -X quit is the primary kill — it triggers Electron's graceful shutdown which SIGTERMs goosed. The pkill lines are last-resort fallbacks. We also fixed a screen session name mismatch that was preventing the graceful path from working.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 278f91582c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-24T14:09:47Z

.github/workflows/pr-smoke-test.yml

+        uses: Swatinem/rust-cache@42dc69e1aa15d09112580998cf2ef0119e2e91ae  # v2
+
+      - name: Install GNU timeout (if missing)
+        run: command -v timeout || brew install coreutils


Make timeout available on macOS before running E2E

On macos-latest, this step only runs brew install coreutils when timeout is missing, but the replay harness later calls timeout directly for every command. Homebrew coreutils installs gtimeout unless the gnubin path is added, so in the common case where timeout is absent the E2E run can still fail with timeout: command not found before tests execute. Please either export the coreutils gnubin directory (or symlink) here, or update the scripts to fall back to gtimeout.

Useful? React with 👍 / 👎.

timeout wasn't available on the macOS runner by default, so this step installs coreutils as a fallback. Once installed, timeout is available on PATH and the scripts work. This has been verified in CI.

chatgpt-codex-connector · 2026-03-24T14:09:47Z

workflow_recipes/debug_e2e_failures/recipe.yaml

+    gh run download {run_id} --repo block/goose --name e2e-test-results --dir /tmp/e2e-results 2>&1
+    # example: result_url: https://github.com/block/goose/actions/runs/23487840554/artifacts/6079302235
+    # gh run download 23487840554 --repo block/goose --name e2e-test-results --dir /tmp/e2e-results 2>&1
+    rm -rf ui/desktop/tests/e2e-tests/results-ci
+    unzip -o /tmp/e2e-results.zip -d ui/desktop/tests/e2e-tests/results-ci


Drop nonexistent unzip step after gh artifact download

When results_url is provided, these instructions call gh run download ... --dir /tmp/e2e-results and then immediately unzip /tmp/e2e-results.zip. gh run download already extracts artifacts into the target directory, so this zip file is not produced; the extra unzip command will fail and block Step 1 for CI-artifact debugging runs.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d38e0721cd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-25T00:29:24Z

ui/desktop/tests/e2e-tests/scripts/e2e-run-all.sh

+    if [[ $waited -ge 120 ]]; then
+      echo "[$(ts)] WARNING: startup lock stale after 60s, breaking it"
+      rm -rf "$STARTUP_LOCK"


Check lock age before breaking startup lock

The stale-lock logic removes STARTUP_LOCK after a waiter has blocked for 60 seconds, but that does not prove the lock holder crashed. If a legitimate startup takes longer than 60s (e.g., slow electron-forge rebuild), another worker will delete the active lock and start concurrently, reintroducing the .vite/build race this lock is intended to prevent and causing intermittent startup/test failures.

Useful? React with 👍 / 👎.

Startup usually takes less than 60s. stale lock blocking CI forever might be worse than a rare race condition.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 872f0273af

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-25T01:34:08Z

ui/desktop/tests/e2e-tests/scripts/e2e-run-all.sh

+    release_startup_lock
+    local DURATION=$(( SECONDS - START_TIME ))
+    echo "[$(ts)] [$TEST_NAME] FAIL — app did not start within 30s (${DURATION}s)"
+    echo "FAIL ${DURATION}s" > "$STATUS_DIR/$TEST_NAME"
+    bash "$SCRIPT_DIR/e2e-stop.sh" "$TEST_NAME" 2>/dev/null || true


Hold startup lock until timed-out app is fully stopped

On the startup-timeout path, the script releases STARTUP_LOCK before calling e2e-stop.sh. If the timed-out electron-forge process is still rebuilding .vite/build when the 30s wait expires, another worker can immediately acquire the lock and start a second app concurrently, reintroducing the same shared-build race this lock is meant to prevent. Keep the lock held until teardown of the timed-out session completes.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-25T01:34:08Z

ui/desktop/tests/e2e-tests/scripts/e2e-stop.sh

+if [[ ! -d "$SESSIONS_DIR" ]]; then
+  exit 0


Run global fallback cleanup even without session dirs

This early exit 0 skips the global fallback pkill cleanup at the bottom of the script whenever /tmp/goose-e2e/sessions is absent. In the normal runner flow, per-test teardown removes session directories, so a later all-sessions stop can bypass fallback cleanup and leave orphan helper processes alive from previous failures, which can interfere with subsequent runs. The script should skip only the per-session loop, not terminate before global cleanup.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cc8d2de0e2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-25T01:49:34Z

ui/desktop/tests/e2e-tests/scripts/e2e-run-all.sh

+    if [[ $waited -ge 120 ]]; then
+      echo "[$(ts)] WARNING: startup lock stale after 60s, breaking it"
+      rm -rf "$STARTUP_LOCK"


Break startup lock only once after stale detection

Fresh evidence: because waited is never reset, once a worker has waited 60s it runs rm -rf "$STARTUP_LOCK" on every 0.5s retry. If another worker is legitimately still starting Electron, this loop can repeatedly delete an active lock and let multiple startups run concurrently, which reintroduces the shared .vite/build race this lock is meant to prevent. Limit stale-lock stealing to a single verified stale lock (e.g., based on lock age/owner) before retrying.

Useful? React with 👍 / 👎.

you put the same comment as above. I have replied with

"Startup usually takes less than 60s. stale lock blocking CI forever might be worse than a rare race condition."

* main: Fix user message text silently dropped when message contains both text and image (#8071) fix: use pnpm when publishing (#8092) Alexhancock/publish npm file format fix (#8091) fix: iteration on the publish-npm workflow (#8087) chore: ignore unmaintained warning for proc-macro-error (#8084) chore: clean up stray recipe.yaml (#8086) chore(release): bump version to 1.29.0 (minor) (#8088) Update lockfile references (#8085) Fix version bump (#8083) Add a code review step which uses a short-lived provider token (#7932)

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 912240b7a6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-25T22:44:28Z

ui/desktop/tests/e2e-tests/recordings/extension-install-playwright.batch.json

@@ -0,0 +1,17 @@
+[
+  ["wait", "[data-testid='chat-input']", "--timeout", "60000"],
+  ["eval", "window.electron.emit('add-extension', {}, 'goose://extension?cmd=npx&arg=%40playwright%2Fmcp%40latest&id=playwright&name=Playwright&description=Interact%20with%20web%20pages%20through%20structured%20accessibility%20snapshots%20using%20Playwright')"],


Pin Playwright MCP dependency to a fixed version

This recording installs Playwright MCP via @latest, so each CI run can pull different upstream code and silently change behavior or break setup even when this repo is unchanged. That makes the new E2E suite non-reproducible and can create false failures unrelated to the PR under test; please pin a concrete package version in the deeplink command.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-25T22:44:29Z

ui/desktop/tests/e2e-tests/recordings/extension-install-playwright.batch.json

+  ["wait", "[data-testid='nav-new-chat-btn']"],
+  ["find", "testid", "nav-new-chat-btn", "click"],
+  ["wait", "[data-testid='chat-input']"],
+  ["find", "testid", "chat-input", "fill", "Use the Playwright browser to navigate to google.com and search for cats. Use the browser_navigate tool to go to https://www.google.com and then use browser_type and browser_click to search for cats."],


Avoid live Google dependency in deterministic replay

This step drives a real google.com search and then asserts success within a short timeout, which makes the test depend on external network availability, consent/CAPTCHA behavior, and third-party page changes. In CI this can fail even when Goose itself is healthy, so the scenario should target a controlled fixture/local page (or mocked endpoint) to keep replay deterministic.

Useful? React with 👍 / 👎.

lifeizhou-ap added 12 commits March 18, 2026 12:47

install agent browser

194945b

instructions to write e2e test

b127a87

first test

f850b18

e2e test setup script

472ccf5

add skills and generated e2e test

5144227

scripts to run all recorded tests

f9f2259

adjust the skills

bdb89fa

reorganised folder

300eece

custom extension test

26db973

adding more tests

f90c636

enable running e2e test in ci

aa0822e

github-advanced-security bot found potential problems Mar 23, 2026

View reviewed changes

.github/workflows/pr-smoke-test.yml Fixed Show fixed Hide fixed

.github/workflows/pr-smoke-test.yml Fixed Show fixed Hide fixed

lifeizhou-ap added 7 commits March 24, 2026 10:37

resolve pnpm install

99ed02f

updated tests and scripts

1a801c8

added action sha

ec54a48

fixed the test and setup

22c4dd1

install timeout

3281469

update justfile

b1fc2bf

lifeizhou-ap marked this pull request as ready for review March 24, 2026 01:36

chatgpt-codex-connector bot reviewed Mar 24, 2026

View reviewed changes

changed the structure of the report

3bef76b

chatgpt-codex-connector bot reviewed Mar 24, 2026

View reviewed changes

lifeizhou-ap added 2 commits March 24, 2026 15:45

fixed log file output

6209133

more script change

2dcb343

chatgpt-codex-connector bot reviewed Mar 24, 2026

View reviewed changes

fixed unstable test

8281334

chatgpt-codex-connector bot reviewed Mar 24, 2026

View reviewed changes

added more timeout for ci run

64af9d9

lifeizhou-ap added 2 commits March 24, 2026 18:49

added recording

c507605

removed per test timeout

0430864

chatgpt-codex-connector bot reviewed Mar 24, 2026

View reviewed changes

adjust view port to run ci

c0c9c54

chatgpt-codex-connector bot reviewed Mar 24, 2026

View reviewed changes

lifeizhou-ap added 2 commits March 25, 2026 00:43

fixed view port problem and first loading page

ce6119f

start app one by one, but run test in parallel

278f915

chatgpt-codex-connector bot reviewed Mar 24, 2026

View reviewed changes

lifeizhou-ap added 2 commits March 25, 2026 10:42

address review comments

a7afadb

make provider, model, api key for testing configurable

d38e072

chatgpt-codex-connector bot reviewed Mar 25, 2026

View reviewed changes

lifeizhou-ap added 2 commits March 25, 2026 11:39

unset unnecessary variable

8e49b1f

use different env

872f027

chatgpt-codex-connector bot reviewed Mar 25, 2026

View reviewed changes

lifeizhou-ap added 2 commits March 25, 2026 12:39

address review comments

50b849b

commit missed change

cc8d2de

lifeizhou-ap changed the title ~~Lifei/agent browser~~ Add agent-driven e2e testing framework for desktop Mar 25, 2026

chatgpt-codex-connector bot reviewed Mar 25, 2026

View reviewed changes

lifeizhou-ap added 2 commits March 25, 2026 13:14

added readme

46dd004

jh-block self-assigned this Mar 25, 2026

set default timeout to 30000

912240b

chatgpt-codex-connector bot reviewed Mar 25, 2026

View reviewed changes

		timeout "$TIMEOUT" bash "$SCRIPT_DIR/replay.sh" \
		"$RECORDING" \

Conversation

lifeizhou-ap commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Features

Changes

Note

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

lifeizhou-ap Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

lifeizhou-ap Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

lifeizhou-ap Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

lifeizhou-ap commented Mar 20, 2026 •

edited

Loading

lifeizhou-ap Mar 24, 2026 •

edited

Loading

lifeizhou-ap Mar 24, 2026 •

edited

Loading

lifeizhou-ap Mar 24, 2026 •

edited

Loading

lifeizhou-ap Mar 25, 2026 •

edited

Loading