Skip to content
Open
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
194945b
install agent browser
lifeizhou-ap Mar 18, 2026
b127a87
instructions to write e2e test
lifeizhou-ap Mar 19, 2026
d582941
Merge branch 'main' into lifei/agent-browser
lifeizhou-ap Mar 19, 2026
f850b18
first test
lifeizhou-ap Mar 19, 2026
472ccf5
e2e test setup script
lifeizhou-ap Mar 20, 2026
5144227
add skills and generated e2e test
lifeizhou-ap Mar 20, 2026
f9f2259
scripts to run all recorded tests
lifeizhou-ap Mar 20, 2026
bdb89fa
adjust the skills
lifeizhou-ap Mar 23, 2026
300eece
reorganised folder
lifeizhou-ap Mar 23, 2026
26db973
custom extension test
lifeizhou-ap Mar 23, 2026
f90c636
adding more tests
lifeizhou-ap Mar 23, 2026
aa0822e
enable running e2e test in ci
lifeizhou-ap Mar 23, 2026
c54e28a
Merge branch 'main' into lifei/agent-browser
lifeizhou-ap Mar 23, 2026
99ed02f
resolve pnpm install
lifeizhou-ap Mar 23, 2026
1a801c8
updated tests and scripts
lifeizhou-ap Mar 24, 2026
ec54a48
added action sha
lifeizhou-ap Mar 24, 2026
22c4dd1
fixed the test and setup
lifeizhou-ap Mar 24, 2026
3281469
install timeout
lifeizhou-ap Mar 24, 2026
b1fc2bf
update justfile
lifeizhou-ap Mar 24, 2026
3bef76b
changed the structure of the report
lifeizhou-ap Mar 24, 2026
6209133
fixed log file output
lifeizhou-ap Mar 24, 2026
2dcb343
more script change
lifeizhou-ap Mar 24, 2026
8281334
fixed unstable test
lifeizhou-ap Mar 24, 2026
64af9d9
added more timeout for ci run
lifeizhou-ap Mar 24, 2026
c507605
added recording
lifeizhou-ap Mar 24, 2026
0430864
removed per test timeout
lifeizhou-ap Mar 24, 2026
c0c9c54
adjust view port to run ci
lifeizhou-ap Mar 24, 2026
ce6119f
fixed view port problem and first loading page
lifeizhou-ap Mar 24, 2026
278f915
start app one by one, but run test in parallel
lifeizhou-ap Mar 24, 2026
a7afadb
address review comments
lifeizhou-ap Mar 24, 2026
d38e072
make provider, model, api key for testing configurable
lifeizhou-ap Mar 25, 2026
8e49b1f
unset unnecessary variable
lifeizhou-ap Mar 25, 2026
872f027
use different env
lifeizhou-ap Mar 25, 2026
50b849b
address review comments
lifeizhou-ap Mar 25, 2026
cc8d2de
commit missed change
lifeizhou-ap Mar 25, 2026
46dd004
added readme
lifeizhou-ap Mar 25, 2026
6c21015
Merge branch 'main' into lifei/agent-browser
lifeizhou-ap Mar 25, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
190 changes: 190 additions & 0 deletions .agents/skills/create-e2e-test/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,190 @@
---
name: create-e2e-test
description: Create replayable e2e tests for the Goose desktop app. Use when the user wants to record, generate, or verify browser-based UI tests that can run in CI without an AI agent.
---

# Create E2E Test

You are an AI agent that creates replayable e2e test scenarios for the Goose desktop app using agent-browser CLI.

## Goal

Given a test scenario in natural language, you will:

1. Explore the app using agent-browser
2. Record a set of deterministic CLI commands as a batch file that can be replayed without an AI agent

**Do NOT read source code to understand the UI.** Do not read `.tsx`, `.ts`, or `.css` files to find elements. Use `snapshot` to discover what is on the page — that is your only method. The one exception: read source code only when you need to add a `data-testid` attribute.

## App Lifecycle

Every time you need a clean app state — whether starting for the first time, retrying during exploration, or verifying a recording — follow these steps:

1. Use the `e2e-app` skill to stop any running instance and start a new one. Note the **test session name** (e.g., `260320-170823`) and **CDP port**.
2. Connect agent-browser to the CDP port using the test session name:
```bash
pnpm exec agent-browser --session <test-session-name> connect <port>
```

### Agent-browser Session Isolation

agent-browser uses `--session` to isolate browser contexts. This prevents multiple agents or tests from interfering with each other.

- **Agent (exploration + replay)**: always use the current app's test session name (e.g., `--session 260320-170823`). Pass it to **every** agent-browser command and to the replay script via `--browser-session`.
- **In batch JSON**: do **not** include test session names — the replay script handles this.
- **CI**: no `--session` flag needed — the replay script defaults to the recording filename (e.g., `settings-dark-mode.batch.json` → `settings-dark-mode`).

All `agent-browser` commands must be run from `ui/desktop` using `pnpm exec agent-browser`.

## Workflow

### Phase 1: Explore and Record

1. Start the app using the App Lifecycle steps above.

2. Walk through the test scenario step by step. For each step:
- **Snapshot** — run `snapshot` after each action (and once before the first action) since refs are invalidated by DOM changes
- **Locate** — identify the element's `@eN` ref from the snapshot, then convert to a stable locator using the Element Locating Strategy (see Reference)
- **Act** — perform the action using the stable locator
- **Save** — append the working command to the batch file at `ui/desktop/tests/e2e-tests/recordings/<name>.batch.json`

If you need a clean app state at any point, restart using the App Lifecycle steps, then replay the saved batch file to catch up before continuing.

Rules:
- Use `wait --load networkidle` before snapshotting slow pages
- Check `agent-browser errors` if something seems wrong
- Never use `@eN` refs in the recording — convert to stable locators immediately

Example (assuming start app test session name is `260320-170823`):
```bash
# Snapshot
agent-browser --session 260320-170823 snapshot
# Output:
# - textbox "Chat input" [ref=e2]
# - button "Send" [ref=e3]

# Locate — get test-id for @e2
agent-browser --session 260320-170823 get attr @e2 data-testid
# Output: chat-input

# Act — count is 1, so find testid works
agent-browser --session 260320-170823 find testid "chat-input" fill "hello"

# Snapshot again
agent-browser --session 260320-170823 snapshot

# Locate — get test-id for @e3
agent-browser --session 260320-170823 get attr @e3 data-testid
# Output: send-button
agent-browser --session 260320-170823 get count "[data-testid='send-button']"
# Output: 2 — duplicate! scope to active session

# Act — count > 1, so narrow the selector to target a unique match
agent-browser --session 260320-170823 click "[data-active-session='true'] [data-testid='send-button']"
```

3. Review the test scenario step by step and confirm you have a recorded command for each one. If any steps are missing, go back to step 2.

Example batch file (`ui/desktop/tests/e2e-tests/recordings/<name>.batch.json`):

```json
[
["wait", "[data-testid='chat-input']"],
["fill", "[data-active-session='true'] [data-testid='chat-input']", "hello"],
["wait", "[data-active-session='true'] [data-testid='send-button']"],
["click", "[data-active-session='true'] [data-testid='send-button']"],
["wait", "--text", "Response"]
]
```

Do **not** include in the batch file: `snapshot`, `get`, `diff`, `console`, `errors`, `open`, `connect`

**Never** use `wait <ms>` (e.g., `wait 3000`) in the batch file. Always wait for a specific condition:
- `wait "[data-testid='element']"` — wait for an element to appear
- `wait --text "some text"` — wait for text to appear
- `wait --load networkidle` — wait for page to finish loading
- `wait --url "**/path"` — wait for navigation

### Phase 2: Verify the Recording

1. Add `wait` commands before actions on dynamic elements. During Phase 1, you used stable locators that run immediately and may hit elements that haven't rendered yet. Add a `wait` before any action that targets a dynamic element:

Before:
```bash
find testid "chat-response" click # fails — element not yet on page
```

After:
```bash
wait "[data-testid='chat-response']"
find testid "chat-response" click
```

2. Restart the app using the App Lifecycle steps.

3. Replay the recording:
```bash
bash ui/desktop/tests/e2e-tests/scripts/replay.sh recordings/<name>.batch.json --connect <port> --browser-session <test-session-name>
```
Always pass the current app test session name. Exit code 0 = pass, non-zero = fail.

4. If replay fails, restart the app, explore the failing step using the Phase 1 cycle (snapshot → locate → convert → act) to find the fix, update the recording, and go back to step 2.

### Phase 3: Write the Scenario

After the recording is verified, write (or update) a scenario file at `ui/desktop/tests/e2e-tests/scenarios/<name>.md` (same base name as the recording, e.g., `settings-dark-mode.batch.json` → `settings-dark-mode.md`). This is a human-readable description of what the test does — the intent, not the implementation.

- Describe each step in terms of **user actions and expected outcomes**, not selectors or test IDs
- Keep it concise — one line per step. The file should only contain a title and numbered steps, nothing else
- The scenario serves as the source of truth for re-recording if the test breaks

Example (`scenarios/settings-dark-mode.md`):
```markdown
# Settings: Dark Mode Toggle

1. Open Settings
2. Navigate to the App tab
3. Verify the app is in light mode
4. Switch to dark mode and verify it applies
5. Switch back to light mode and verify it applies
```

## Reference

### Element Locating Strategy

**Always** verify uniqueness with `get count` before using any locator. If count > 1, narrow the selector or fall back to the next strategy.

For each element, find a stable locator using this priority:

1. **Semantic locator (preferred)**: use the role and name directly from the snapshot (e.g., `button "Send"` → `find role button --name "Send" click`). Never use a bare role without `--name`.
- Count is 1 → use `find role <role> --name "<name>" <action>`
- Count > 1 → fall back to step 2

2. **Test ID**: `get attr @eN data-testid` → if exists, use `find testid "<id>" <action>`.
- If count > 1 and the element is inside a chat session, scope to `[data-active-session='true'] [data-testid='<id>']`
- If still count > 1, use `find first "[data-testid='<id>']" <action>` or `find nth <index> "[data-testid='<id>']" <action>` (0-based index)

3. **Add a data-testid (last resort)**: if neither above works, add a `data-testid` to the source code.
- Names must be globally unique and unambiguous. Include the parent component or location, the element type, and its purpose (e.g., `bottom-menu-alert-dot` not `alert-dot`, `session-card` not `card`)
- Only add the `data-testid` attribute — do not change any other source code
- Note the code change so it can be committed alongside the test

**Never** use `@eN` refs in recorded commands — they are session-specific.

### Assertions

Use `wait` and `is` commands as assertions in the recording:

- `wait --text "Success"` — assert text appears (with timeout)
- `is visible ".error-message"` — assert element is visible
- `wait --url "**/dashboard"` — assert navigation happened

### Tips

- Run `pnpm exec agent-browser --help` or `pnpm exec agent-browser <command> --help` to learn unfamiliar commands
- Start with `wait --load networkidle` after `open` to ensure the page is ready
- Use `wait --text` over `wait <ms>` — it's more resilient to timing variations
- Keep recordings short — one user journey per file
- Name files descriptively: `login-with-email.batch.json`, `send-chat-message.batch.json`
- The "Chat" nav button toggles the chat list and start new chat. It is expanded by default on a fresh app
52 changes: 52 additions & 0 deletions .agents/skills/e2e-app/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
---
name: e2e-app
description: Start and stop the Goose Electron app ONLY for e2e testing. Use when you need to launch, manage, or tear down the desktop app for end-to-end tests.
---

# E2E App Management

Scripts are in `ui/desktop/tests/e2e-tests/scripts/`.

## Starting the App

The start script blocks (runs Electron in foreground), so use `screen` to background it.
The script self-activates hermit for `pnpm`/`node`, but needs `ANTHROPIC_API_KEY` in the environment.

```bash
TEST_SESSION_NAME=$(date +"%y%m%d-%H%M%S")
SCREEN_NAME="e2e-$(date +%s)"
screen -dmS $SCREEN_NAME bash -c "source ~/.zshrc 2>/dev/null; bash ui/desktop/tests/e2e-tests/scripts/e2e-start.sh $TEST_SESSION_NAME"
```

Then wait for the port file and verify the app is listening:

```bash
# Wait for port file and app to be ready (up to 30s)
for i in $(seq 1 30); do
if [[ -f "/tmp/goose-e2e/sessions/$TEST_SESSION_NAME/.port" ]]; then
CDP_PORT=$(cat /tmp/goose-e2e/sessions/$TEST_SESSION_NAME/.port)
if lsof -i :"$CDP_PORT" &>/dev/null; then
echo "App ready — Test session name: $TEST_SESSION_NAME, CDP port: $CDP_PORT"
break
fi
fi
sleep 1
done
```

If the app doesn't start, check the screen log:
```bash
screen -ls # verify screen session exists
screen -r $SCREEN_NAME # attach to see errors (Ctrl-A D to detach)
```

Common startup failures:
- `ANTHROPIC_API_KEY must be set` — key not in environment; ensure `~/.zshrc` exports it
- `pnpm: not found` — hermit activation failed; the script does this automatically now
- Screen session dies immediately — check `screen -ls`; if no session, run the script directly to see errors

## Stopping the App

```bash
bash ui/desktop/tests/e2e-tests/scripts/e2e-stop.sh <session-name>
```
37 changes: 35 additions & 2 deletions .github/workflows/pr-smoke-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -55,15 +55,15 @@ jobs:
with:
ref: ${{ github.event.inputs.branch || github.ref }}

- uses: actions-rust-lang/setup-rust-toolchain@v1
- uses: actions-rust-lang/setup-rust-toolchain@150fca883cd4034361b621bd4e6a9d34e5143606 # v1

- name: Install Dependencies
run: |
sudo apt update -y
sudo apt install -y libdbus-1-dev gnome-keyring libxcb1-dev

- name: Cache Rust dependencies
uses: Swatinem/rust-cache@v2
uses: Swatinem/rust-cache@42dc69e1aa15d09112580998cf2ef0119e2e91ae # v2

- name: Build Binary for Smoke Tests
run: |
Expand All @@ -83,6 +83,39 @@ jobs:
path: target/debug/goosed
retention-days: 1

e2e-desktop-tests:
name: E2E Desktop Tests
runs-on: macos-latest
needs: changes
if: needs.changes.outputs.code == 'true' || github.event_name == 'workflow_dispatch'
steps:
- name: Checkout Code
uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
with:
ref: ${{ github.event.inputs.branch || github.ref }}

- uses: actions-rust-lang/setup-rust-toolchain@150fca883cd4034361b621bd4e6a9d34e5143606 # v1

- name: Cache Rust dependencies
uses: Swatinem/rust-cache@42dc69e1aa15d09112580998cf2ef0119e2e91ae # v2

- name: Install GNU timeout (if missing)
run: command -v timeout || brew install coreutils

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Make timeout available on macOS before running E2E

On macos-latest, this step only runs brew install coreutils when timeout is missing, but the replay harness later calls timeout directly for every command. Homebrew coreutils installs gtimeout unless the gnubin path is added, so in the common case where timeout is absent the E2E run can still fail with timeout: command not found before tests execute. Please either export the coreutils gnubin directory (or symlink) here, or update the scripts to fall back to gtimeout.

Useful? React with 👍 / 👎.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timeout wasn't available on the macOS runner by default, so this step installs coreutils as a fallback. Once installed, timeout is available on PATH and the scripts work. This has been verified in CI.


- name: Run E2E Tests
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
GOOSE_DISABLE_KEYRING: 1
run: source bin/activate-hermit && just e2e

- name: Upload E2E Test Results
if: always()
uses: actions/upload-artifact@b7c566a772e6b6bfb58ed0dc250532a479d7789f # v6.0.0
with:
name: e2e-test-results
path: ui/desktop/tests/e2e-tests/results/
retention-days: 7

smoke-tests:
name: Smoke Tests
runs-on: ubuntu-latest
Expand Down
20 changes: 20 additions & 0 deletions Justfile
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,15 @@ release-intel:
cargo build --release --target x86_64-apple-darwin
@just copy-binary-intel

copy-goosed BUILD_MODE="release":
@if [ -f ./target/{{BUILD_MODE}}/goosed ]; then \
echo "Copying goosed binary from target/{{BUILD_MODE}}..."; \
cp -p ./target/{{BUILD_MODE}}/goosed ./ui/desktop/src/bin/; \
else \
echo "goosed binary not found in target/{{BUILD_MODE}}"; \
exit 1; \
fi

copy-binary BUILD_MODE="release":
@if [ -f ./target/{{BUILD_MODE}}/goosed ]; then \
echo "Copying goosed binary from target/{{BUILD_MODE}}..."; \
Expand Down Expand Up @@ -464,3 +473,14 @@ build-test-tools:
record-mcp-tests: build-test-tools
GOOSE_RECORD_MCP=1 cargo test --package goose --test mcp_integration_test
git add crates/goose/tests/mcp_replays/

e2e:
@echo "Building goosed..."
cargo build --bin goosed
@just copy-goosed debug
@echo "Installing dependencies..."
cd ui && pnpm install --frozen-lockfile
@echo "Generating API types..."
cd ui/desktop && pnpm run generate-api
@echo "Running E2E tests..."
bash ui/desktop/tests/e2e-tests/scripts/e2e-run-all.sh
3 changes: 3 additions & 0 deletions ui/desktop/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,6 @@ src/bin/goose-npm/
src/bin/temporal.db
# Signing credentials
.env.signing

tests/e2e-tests/results/
tests/e2e-tests/results-rerun
1 change: 1 addition & 0 deletions ui/desktop/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,7 @@
"@vitejs/plugin-react": "^5.1.4",
"@vitest/coverage-v8": "^4.0.18",
"@vitest/ui": "^4.0.18",
"agent-browser": "^0.20.14",
"autoprefixer": "^10.4.24",
"electron": "41.0.0",
"electron-devtools-installer": "^4.0.0",
Expand Down
2 changes: 2 additions & 0 deletions ui/desktop/src/components/ChatInput.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -1367,6 +1367,7 @@ export default function ChatInput({
size="sm"
shape="round"
variant="outline"
data-testid="send-button"
disabled={isSubmitButtonDisabled}
className={`rounded-full px-10 py-2 flex items-center gap-2 ${
isSubmitButtonDisabled
Expand Down Expand Up @@ -1593,6 +1594,7 @@ export default function ChatInput({
variant="ghost"
size="sm"
className="flex items-center justify-center text-text-primary/70 hover:text-text-primary text-xs cursor-pointer"
data-testid="recipe-action-button"
>
<ChefHat size={16} />
</Button>
Expand Down
1 change: 1 addition & 0 deletions ui/desktop/src/components/ChatSessionsContainer.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ export default function ChatSessionsContainer({
key={session.sessionId}
className={`absolute inset-0 ${isVisible ? 'block' : 'hidden'}`}
data-session-id={session.sessionId}
data-active-session={isVisible}
>
<BaseChat
setChat={setChat}
Expand Down
1 change: 1 addition & 0 deletions ui/desktop/src/components/Layout/CondensedRenderer.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -177,6 +177,7 @@ export const CondensedRenderer: React.FC<NavigationRendererProps> = ({
'flex items-center justify-center'
)}
title="New Chat"
data-testid="nav-new-chat-btn"
>
<Plus className="w-4 h-4" />
</motion.button>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ export const ChatSessionsDropdown: React.FC<ChatSessionsDropdownProps> = ({
className="flex items-center gap-2 px-3 py-2 text-sm rounded-lg cursor-pointer"
>
<Plus className="w-4 h-4 flex-shrink-0" />
<span>New Chat</span>
<span data-testid="nav-start-new-chat">New Chat</span>
</DropdownMenuItem>

{sessions.length > 0 && <DropdownMenuSeparator className="my-1" />}
Expand Down Expand Up @@ -96,7 +96,7 @@ export const ChatSessionsDropdown: React.FC<ChatSessionsDropdownProps> = ({
className="flex items-center gap-2 px-3 py-2 text-sm rounded-lg cursor-pointer text-text-secondary"
>
<History className="w-4 h-4 flex-shrink-0" />
<span>Show All</span>
<span data-testid="nav-show-all-sessions">Show All</span>
</DropdownMenuItem>
</>
)}
Expand Down
Loading
Loading