-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Lifei/agent browser #8032
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
lifeizhou-ap
wants to merge
12
commits into
main
Choose a base branch
from
lifei/agent-browser
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+1,130
−33
Draft
Lifei/agent browser #8032
Changes from all commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
194945b
install agent browser
lifeizhou-ap b127a87
instructions to write e2e test
lifeizhou-ap d582941
Merge branch 'main' into lifei/agent-browser
lifeizhou-ap f850b18
first test
lifeizhou-ap 472ccf5
e2e test setup script
lifeizhou-ap 5144227
add skills and generated e2e test
lifeizhou-ap f9f2259
scripts to run all recorded tests
lifeizhou-ap bdb89fa
adjust the skills
lifeizhou-ap 300eece
reorganised folder
lifeizhou-ap 26db973
custom extension test
lifeizhou-ap f90c636
adding more tests
lifeizhou-ap aa0822e
enable running e2e test in ci
lifeizhou-ap File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,190 @@ | ||
| --- | ||
| name: create-e2e-test | ||
| description: Create replayable e2e tests for the Goose desktop app. Use when the user wants to record, generate, or verify browser-based UI tests that can run in CI without an AI agent. | ||
| --- | ||
|
|
||
| # Create E2E Test | ||
|
|
||
| You are an AI agent that creates replayable e2e test scenarios for the Goose desktop app using agent-browser CLI. | ||
|
|
||
| ## Goal | ||
|
|
||
| Given a test scenario in natural language, you will: | ||
|
|
||
| 1. Explore the app using agent-browser | ||
| 2. Record a set of deterministic CLI commands as a batch file that can be replayed without an AI agent | ||
|
|
||
| **Do NOT read source code to understand the UI.** Do not read `.tsx`, `.ts`, or `.css` files to find elements. Use `snapshot` to discover what is on the page — that is your only method. The one exception: read source code only when you need to add a `data-testid` attribute. | ||
|
|
||
| ## App Lifecycle | ||
|
|
||
| Every time you need a clean app state — whether starting for the first time, retrying during exploration, or verifying a recording — follow these steps: | ||
|
|
||
| 1. Use the `e2e-app` skill to stop any running instance and start a new one. Note the **test session name** (e.g., `260320-170823`) and **CDP port**. | ||
| 2. Connect agent-browser to the CDP port using the test session name: | ||
| ```bash | ||
| pnpm exec agent-browser --session <test-session-name> connect <port> | ||
| ``` | ||
|
|
||
| ### Agent-browser Session Isolation | ||
|
|
||
| agent-browser uses `--session` to isolate browser contexts. This prevents multiple agents or tests from interfering with each other. | ||
|
|
||
| - **Agent (exploration + replay)**: always use the current app's test session name (e.g., `--session 260320-170823`). Pass it to **every** agent-browser command and to the replay script via `--browser-session`. | ||
| - **In batch JSON**: do **not** include test session names — the replay script handles this. | ||
| - **CI**: no `--session` flag needed — the replay script defaults to the recording filename (e.g., `settings-dark-mode.batch.json` → `settings-dark-mode`). | ||
|
|
||
| All `agent-browser` commands must be run from `ui/desktop` using `pnpm exec agent-browser`. | ||
|
|
||
| ## Workflow | ||
|
|
||
| ### Phase 1: Explore and Record | ||
|
|
||
| 1. Start the app using the App Lifecycle steps above. | ||
|
|
||
| 2. Walk through the test scenario step by step. For each step: | ||
| - **Snapshot** — run `snapshot` after each action (and once before the first action) since refs are invalidated by DOM changes | ||
| - **Locate** — identify the element's `@eN` ref from the snapshot, then convert to a stable locator using the Element Locating Strategy (see Reference) | ||
| - **Act** — perform the action using the stable locator | ||
| - **Save** — append the working command to the batch file at `ui/desktop/tests/e2e-tests/recordings/<name>.batch.json` | ||
|
|
||
| If you need a clean app state at any point, restart using the App Lifecycle steps, then replay the saved batch file to catch up before continuing. | ||
|
|
||
| Rules: | ||
| - Use `wait --load networkidle` before snapshotting slow pages | ||
| - Check `agent-browser errors` if something seems wrong | ||
| - Never use `@eN` refs in the recording — convert to stable locators immediately | ||
|
|
||
| Example (assuming start app test session name is `260320-170823`): | ||
| ```bash | ||
| # Snapshot | ||
| agent-browser --session 260320-170823 snapshot | ||
| # Output: | ||
| # - textbox "Chat input" [ref=e2] | ||
| # - button "Send" [ref=e3] | ||
|
|
||
| # Locate — get test-id for @e2 | ||
| agent-browser --session 260320-170823 get attr @e2 data-testid | ||
| # Output: chat-input | ||
|
|
||
| # Act — count is 1, so find testid works | ||
| agent-browser --session 260320-170823 find testid "chat-input" fill "hello" | ||
|
|
||
| # Snapshot again | ||
| agent-browser --session 260320-170823 snapshot | ||
|
|
||
| # Locate — get test-id for @e3 | ||
| agent-browser --session 260320-170823 get attr @e3 data-testid | ||
| # Output: send-button | ||
| agent-browser --session 260320-170823 get count "[data-testid='send-button']" | ||
| # Output: 2 — duplicate! scope to active session | ||
|
|
||
| # Act — count > 1, so narrow the selector to target a unique match | ||
| agent-browser --session 260320-170823 click "[data-active-session='true'] [data-testid='send-button']" | ||
| ``` | ||
|
|
||
| 3. Review the test scenario step by step and confirm you have a recorded command for each one. If any steps are missing, go back to step 2. | ||
|
|
||
| Example batch file (`ui/desktop/tests/e2e-tests/recordings/<name>.batch.json`): | ||
|
|
||
| ```json | ||
| [ | ||
| ["wait", "[data-testid='chat-input']"], | ||
| ["fill", "[data-active-session='true'] [data-testid='chat-input']", "hello"], | ||
| ["wait", "[data-active-session='true'] [data-testid='send-button']"], | ||
| ["click", "[data-active-session='true'] [data-testid='send-button']"], | ||
| ["wait", "--text", "Response"] | ||
| ] | ||
| ``` | ||
|
|
||
| Do **not** include in the batch file: `snapshot`, `get`, `diff`, `console`, `errors`, `open`, `connect` | ||
|
|
||
| **Never** use `wait <ms>` (e.g., `wait 3000`) in the batch file. Always wait for a specific condition: | ||
| - `wait "[data-testid='element']"` — wait for an element to appear | ||
| - `wait --text "some text"` — wait for text to appear | ||
| - `wait --load networkidle` — wait for page to finish loading | ||
| - `wait --url "**/path"` — wait for navigation | ||
|
|
||
| ### Phase 2: Verify the Recording | ||
|
|
||
| 1. Add `wait` commands before actions on dynamic elements. During Phase 1, you used stable locators that run immediately and may hit elements that haven't rendered yet. Add a `wait` before any action that targets a dynamic element: | ||
|
|
||
| Before: | ||
| ```bash | ||
| find testid "chat-response" click # fails — element not yet on page | ||
| ``` | ||
|
|
||
| After: | ||
| ```bash | ||
| wait "[data-testid='chat-response']" | ||
| find testid "chat-response" click | ||
| ``` | ||
|
|
||
| 2. Restart the app using the App Lifecycle steps. | ||
|
|
||
| 3. Replay the recording: | ||
| ```bash | ||
| bash ui/desktop/tests/e2e-tests/scripts/replay.sh recordings/<name>.batch.json --connect <port> --browser-session <test-session-name> | ||
| ``` | ||
| Always pass the current app test session name. Exit code 0 = pass, non-zero = fail. | ||
|
|
||
| 4. If replay fails, restart the app, explore the failing step using the Phase 1 cycle (snapshot → locate → convert → act) to find the fix, update the recording, and go back to step 2. | ||
|
|
||
| ### Phase 3: Write the Scenario | ||
|
|
||
| After the recording is verified, write (or update) a scenario file at `ui/desktop/tests/e2e-tests/scenarios/<name>.md` (same base name as the recording, e.g., `settings-dark-mode.batch.json` → `settings-dark-mode.md`). This is a human-readable description of what the test does — the intent, not the implementation. | ||
|
|
||
| - Describe each step in terms of **user actions and expected outcomes**, not selectors or test IDs | ||
| - Keep it concise — one line per step. The file should only contain a title and numbered steps, nothing else | ||
| - The scenario serves as the source of truth for re-recording if the test breaks | ||
|
|
||
| Example (`scenarios/settings-dark-mode.md`): | ||
| ```markdown | ||
| # Settings: Dark Mode Toggle | ||
|
|
||
| 1. Open Settings | ||
| 2. Navigate to the App tab | ||
| 3. Verify the app is in light mode | ||
| 4. Switch to dark mode and verify it applies | ||
| 5. Switch back to light mode and verify it applies | ||
| ``` | ||
|
|
||
| ## Reference | ||
|
|
||
| ### Element Locating Strategy | ||
|
|
||
| **Always** verify uniqueness with `get count` before using any locator. If count > 1, narrow the selector or fall back to the next strategy. | ||
|
|
||
| For each element, find a stable locator using this priority: | ||
|
|
||
| 1. **Semantic locator (preferred)**: use the role and name directly from the snapshot (e.g., `button "Send"` → `find role button --name "Send" click`). Never use a bare role without `--name`. | ||
| - Count is 1 → use `find role <role> --name "<name>" <action>` | ||
| - Count > 1 → fall back to step 2 | ||
|
|
||
| 2. **Test ID**: `get attr @eN data-testid` → if exists, use `find testid "<id>" <action>`. | ||
| - If count > 1 and the element is inside a chat session, scope to `[data-active-session='true'] [data-testid='<id>']` | ||
| - If still count > 1, use `find first "[data-testid='<id>']" <action>` or `find nth <index> "[data-testid='<id>']" <action>` (0-based index) | ||
|
|
||
| 3. **Add a data-testid (last resort)**: if neither above works, add a `data-testid` to the source code. | ||
| - Names must be globally unique and unambiguous. Include the parent component or location, the element type, and its purpose (e.g., `bottom-menu-alert-dot` not `alert-dot`, `session-card` not `card`) | ||
| - Only add the `data-testid` attribute — do not change any other source code | ||
| - Note the code change so it can be committed alongside the test | ||
|
|
||
| **Never** use `@eN` refs in recorded commands — they are session-specific. | ||
|
|
||
| ### Assertions | ||
|
|
||
| Use `wait` and `is` commands as assertions in the recording: | ||
|
|
||
| - `wait --text "Success"` — assert text appears (with timeout) | ||
| - `is visible ".error-message"` — assert element is visible | ||
| - `wait --url "**/dashboard"` — assert navigation happened | ||
|
|
||
| ### Tips | ||
|
|
||
| - Run `pnpm exec agent-browser --help` or `pnpm exec agent-browser <command> --help` to learn unfamiliar commands | ||
| - Start with `wait --load networkidle` after `open` to ensure the page is ready | ||
| - Use `wait --text` over `wait <ms>` — it's more resilient to timing variations | ||
| - Keep recordings short — one user journey per file | ||
| - Name files descriptively: `login-with-email.batch.json`, `send-chat-message.batch.json` | ||
| - The "Chat" nav button toggles the chat list and start new chat. It is expanded by default on a fresh app |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,52 @@ | ||
| --- | ||
| name: e2e-app | ||
| description: Start and stop the Goose Electron app ONLY for e2e testing. Use when you need to launch, manage, or tear down the desktop app for end-to-end tests. | ||
| --- | ||
|
|
||
| # E2E App Management | ||
|
|
||
| Scripts are in `ui/desktop/tests/e2e-tests/scripts/`. | ||
|
|
||
| ## Starting the App | ||
|
|
||
| The start script blocks (runs Electron in foreground), so use `screen` to background it. | ||
| The script self-activates hermit for `pnpm`/`node`, but needs `ANTHROPIC_API_KEY` in the environment. | ||
|
|
||
| ```bash | ||
| TEST_SESSION_NAME=$(date +"%y%m%d-%H%M%S") | ||
| SCREEN_NAME="e2e-$(date +%s)" | ||
| screen -dmS $SCREEN_NAME bash -c "source ~/.zshrc 2>/dev/null; bash ui/desktop/tests/e2e-tests/scripts/e2e-start.sh $TEST_SESSION_NAME" | ||
| ``` | ||
|
|
||
| Then wait for the port file and verify the app is listening: | ||
|
|
||
| ```bash | ||
| # Wait for port file and app to be ready (up to 30s) | ||
| for i in $(seq 1 30); do | ||
| if [[ -f "/tmp/goose-e2e/$TEST_SESSION_NAME/.port" ]]; then | ||
| CDP_PORT=$(cat /tmp/goose-e2e/$TEST_SESSION_NAME/.port) | ||
| if lsof -i :"$CDP_PORT" &>/dev/null; then | ||
| echo "App ready — Test session name: $TEST_SESSION_NAME, CDP port: $CDP_PORT" | ||
| break | ||
| fi | ||
| fi | ||
| sleep 1 | ||
| done | ||
| ``` | ||
|
|
||
| If the app doesn't start, check the screen log: | ||
| ```bash | ||
| screen -ls # verify screen session exists | ||
| screen -r $SCREEN_NAME # attach to see errors (Ctrl-A D to detach) | ||
| ``` | ||
|
|
||
| Common startup failures: | ||
| - `ANTHROPIC_API_KEY must be set` — key not in environment; ensure `~/.zshrc` exports it | ||
| - `pnpm: not found` — hermit activation failed; the script does this automatically now | ||
| - Screen session dies immediately — check `screen -ls`; if no session, run the script directly to see errors | ||
|
|
||
| ## Stopping the App | ||
|
|
||
| ```bash | ||
| bash ui/desktop/tests/e2e-tests/scripts/e2e-stop.sh <session-name> | ||
| ``` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -11,3 +11,5 @@ src/bin/goose-npm/ | |
| src/bin/temporal.db | ||
| # Signing credentials | ||
| .env.signing | ||
|
|
||
| tests/e2e-tests/screenshots/ | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Check failure
Code scanning / Semgrep OSS
Insecure GitHub Actions: Third-Party Action Not Pinned to Commit SHA Error