Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
190 changes: 190 additions & 0 deletions .agents/skills/create-e2e-test/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,190 @@
---
name: create-e2e-test
description: Create replayable e2e tests for the Goose desktop app. Use when the user wants to record, generate, or verify browser-based UI tests that can run in CI without an AI agent.
---

# Create E2E Test

You are an AI agent that creates replayable e2e test scenarios for the Goose desktop app using agent-browser CLI.

## Goal

Given a test scenario in natural language, you will:

1. Explore the app using agent-browser
2. Record a set of deterministic CLI commands as a batch file that can be replayed without an AI agent

**Do NOT read source code to understand the UI.** Do not read `.tsx`, `.ts`, or `.css` files to find elements. Use `snapshot` to discover what is on the page — that is your only method. The one exception: read source code only when you need to add a `data-testid` attribute.

## App Lifecycle

Every time you need a clean app state — whether starting for the first time, retrying during exploration, or verifying a recording — follow these steps:

1. Use the `e2e-app` skill to stop any running instance and start a new one. Note the **test session name** (e.g., `260320-170823`) and **CDP port**.
2. Connect agent-browser to the CDP port using the test session name:
```bash
pnpm exec agent-browser --session <test-session-name> connect <port>
```

### Agent-browser Session Isolation

agent-browser uses `--session` to isolate browser contexts. This prevents multiple agents or tests from interfering with each other.

- **Agent (exploration + replay)**: always use the current app's test session name (e.g., `--session 260320-170823`). Pass it to **every** agent-browser command and to the replay script via `--browser-session`.
- **In batch JSON**: do **not** include test session names — the replay script handles this.
- **CI**: no `--session` flag needed — the replay script defaults to the recording filename (e.g., `settings-dark-mode.batch.json` → `settings-dark-mode`).

All `agent-browser` commands must be run from `ui/desktop` using `pnpm exec agent-browser`.

## Workflow

### Phase 1: Explore and Record

1. Start the app using the App Lifecycle steps above.

2. Walk through the test scenario step by step. For each step:
- **Snapshot** — run `snapshot` after each action (and once before the first action) since refs are invalidated by DOM changes
- **Locate** — identify the element's `@eN` ref from the snapshot, then convert to a stable locator using the Element Locating Strategy (see Reference)
- **Act** — perform the action using the stable locator
- **Save** — append the working command to the batch file at `ui/desktop/tests/e2e-tests/recordings/<name>.batch.json`

If you need a clean app state at any point, restart using the App Lifecycle steps, then replay the saved batch file to catch up before continuing.

Rules:
- Use `wait --load networkidle` before snapshotting slow pages
- Check `agent-browser errors` if something seems wrong
- Never use `@eN` refs in the recording — convert to stable locators immediately

Example (assuming start app test session name is `260320-170823`):
```bash
# Snapshot
agent-browser --session 260320-170823 snapshot
# Output:
# - textbox "Chat input" [ref=e2]
# - button "Send" [ref=e3]

# Locate — get test-id for @e2
agent-browser --session 260320-170823 get attr @e2 data-testid
# Output: chat-input

# Act — count is 1, so find testid works
agent-browser --session 260320-170823 find testid "chat-input" fill "hello"

# Snapshot again
agent-browser --session 260320-170823 snapshot

# Locate — get test-id for @e3
agent-browser --session 260320-170823 get attr @e3 data-testid
# Output: send-button
agent-browser --session 260320-170823 get count "[data-testid='send-button']"
# Output: 2 — duplicate! scope to active session

# Act — count > 1, so narrow the selector to target a unique match
agent-browser --session 260320-170823 click "[data-active-session='true'] [data-testid='send-button']"
```

3. Review the test scenario step by step and confirm you have a recorded command for each one. If any steps are missing, go back to step 2.

Example batch file (`ui/desktop/tests/e2e-tests/recordings/<name>.batch.json`):

```json
[
["wait", "[data-testid='chat-input']"],
["fill", "[data-active-session='true'] [data-testid='chat-input']", "hello"],
["wait", "[data-active-session='true'] [data-testid='send-button']"],
["click", "[data-active-session='true'] [data-testid='send-button']"],
["wait", "--text", "Response"]
]
```

Do **not** include in the batch file: `snapshot`, `get`, `diff`, `console`, `errors`, `open`, `connect`

**Never** use `wait <ms>` (e.g., `wait 3000`) in the batch file. Always wait for a specific condition:
- `wait "[data-testid='element']"` — wait for an element to appear
- `wait --text "some text"` — wait for text to appear
- `wait --load networkidle` — wait for page to finish loading
- `wait --url "**/path"` — wait for navigation

### Phase 2: Verify the Recording

1. Add `wait` commands before actions on dynamic elements. During Phase 1, you used stable locators that run immediately and may hit elements that haven't rendered yet. Add a `wait` before any action that targets a dynamic element:

Before:
```bash
find testid "chat-response" click # fails — element not yet on page
```

After:
```bash
wait "[data-testid='chat-response']"
find testid "chat-response" click
```

2. Restart the app using the App Lifecycle steps.

3. Replay the recording:
```bash
bash ui/desktop/tests/e2e-tests/scripts/replay.sh recordings/<name>.batch.json --connect <port> --browser-session <test-session-name>
```
Always pass the current app test session name. Exit code 0 = pass, non-zero = fail.

4. If replay fails, restart the app, explore the failing step using the Phase 1 cycle (snapshot → locate → convert → act) to find the fix, update the recording, and go back to step 2.

### Phase 3: Write the Scenario

After the recording is verified, write (or update) a scenario file at `ui/desktop/tests/e2e-tests/scenarios/<name>.md` (same base name as the recording, e.g., `settings-dark-mode.batch.json` → `settings-dark-mode.md`). This is a human-readable description of what the test does — the intent, not the implementation.

- Describe each step in terms of **user actions and expected outcomes**, not selectors or test IDs
- Keep it concise — one line per step. The file should only contain a title and numbered steps, nothing else
- The scenario serves as the source of truth for re-recording if the test breaks

Example (`scenarios/settings-dark-mode.md`):
```markdown
# Settings: Dark Mode Toggle

1. Open Settings
2. Navigate to the App tab
3. Verify the app is in light mode
4. Switch to dark mode and verify it applies
5. Switch back to light mode and verify it applies
```

## Reference

### Element Locating Strategy

**Always** verify uniqueness with `get count` before using any locator. If count > 1, narrow the selector or fall back to the next strategy.

For each element, find a stable locator using this priority:

1. **Semantic locator (preferred)**: use the role and name directly from the snapshot (e.g., `button "Send"` → `find role button --name "Send" click`). Never use a bare role without `--name`.
- Count is 1 → use `find role <role> --name "<name>" <action>`
- Count > 1 → fall back to step 2

2. **Test ID**: `get attr @eN data-testid` → if exists, use `find testid "<id>" <action>`.
- If count > 1 and the element is inside a chat session, scope to `[data-active-session='true'] [data-testid='<id>']`
- If still count > 1, use `find first "[data-testid='<id>']" <action>` or `find nth <index> "[data-testid='<id>']" <action>` (0-based index)

3. **Add a data-testid (last resort)**: if neither above works, add a `data-testid` to the source code.
- Names must be globally unique and unambiguous. Include the parent component or location, the element type, and its purpose (e.g., `bottom-menu-alert-dot` not `alert-dot`, `session-card` not `card`)
- Only add the `data-testid` attribute — do not change any other source code
- Note the code change so it can be committed alongside the test

**Never** use `@eN` refs in recorded commands — they are session-specific.

### Assertions

Use `wait` and `is` commands as assertions in the recording:

- `wait --text "Success"` — assert text appears (with timeout)
- `is visible ".error-message"` — assert element is visible
- `wait --url "**/dashboard"` — assert navigation happened

### Tips

- Run `pnpm exec agent-browser --help` or `pnpm exec agent-browser <command> --help` to learn unfamiliar commands
- Start with `wait --load networkidle` after `open` to ensure the page is ready
- Use `wait --text` over `wait <ms>` — it's more resilient to timing variations
- Keep recordings short — one user journey per file
- Name files descriptively: `login-with-email.batch.json`, `send-chat-message.batch.json`
- The "Chat" nav button toggles the chat list and start new chat. It is expanded by default on a fresh app
52 changes: 52 additions & 0 deletions .agents/skills/e2e-app/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
---
name: e2e-app
description: Start and stop the Goose Electron app ONLY for e2e testing. Use when you need to launch, manage, or tear down the desktop app for end-to-end tests.
---

# E2E App Management

Scripts are in `ui/desktop/tests/e2e-tests/scripts/`.

## Starting the App

The start script blocks (runs Electron in foreground), so use `screen` to background it.
The script self-activates hermit for `pnpm`/`node`, but needs `ANTHROPIC_API_KEY` in the environment.

```bash
TEST_SESSION_NAME=$(date +"%y%m%d-%H%M%S")
SCREEN_NAME="e2e-$(date +%s)"
screen -dmS $SCREEN_NAME bash -c "source ~/.zshrc 2>/dev/null; bash ui/desktop/tests/e2e-tests/scripts/e2e-start.sh $TEST_SESSION_NAME"
```

Then wait for the port file and verify the app is listening:

```bash
# Wait for port file and app to be ready (up to 30s)
for i in $(seq 1 30); do
if [[ -f "/tmp/goose-e2e/$TEST_SESSION_NAME/.port" ]]; then
CDP_PORT=$(cat /tmp/goose-e2e/$TEST_SESSION_NAME/.port)
if lsof -i :"$CDP_PORT" &>/dev/null; then
echo "App ready — Test session name: $TEST_SESSION_NAME, CDP port: $CDP_PORT"
break
fi
fi
sleep 1
done
```

If the app doesn't start, check the screen log:
```bash
screen -ls # verify screen session exists
screen -r $SCREEN_NAME # attach to see errors (Ctrl-A D to detach)
```

Common startup failures:
- `ANTHROPIC_API_KEY must be set` — key not in environment; ensure `~/.zshrc` exports it
- `pnpm: not found` — hermit activation failed; the script does this automatically now
- Screen session dies immediately — check `screen -ls`; if no session, run the script directly to see errors

## Stopping the App

```bash
bash ui/desktop/tests/e2e-tests/scripts/e2e-stop.sh <session-name>
```
26 changes: 26 additions & 0 deletions .github/workflows/pr-smoke-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,32 @@
path: target/debug/goosed
retention-days: 1

e2e-desktop-tests:
name: E2E Desktop Tests
runs-on: macos-latest
needs: changes
if: needs.changes.outputs.code == 'true' || github.event_name == 'workflow_dispatch'
steps:
- name: Checkout Code
uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
with:
ref: ${{ github.event.inputs.branch || github.ref }}

- uses: actions-rust-lang/setup-rust-toolchain@v1

Check failure

Code scanning / Semgrep OSS

Insecure GitHub Actions: Third-Party Action Not Pinned to Commit SHA Error

Insecure GitHub Actions: Third-Party Action Not Pinned to Commit SHA

- name: Cache Rust dependencies
uses: Swatinem/rust-cache@v2

Check failure

Code scanning / Semgrep OSS

Insecure GitHub Actions: Third-Party Action Not Pinned to Commit SHA Error

Insecure GitHub Actions: Third-Party Action Not Pinned to Commit SHA

- name: Install Node.js Dependencies
run: source ../../bin/activate-hermit && pnpm install --frozen-lockfile
working-directory: ui/desktop

- name: Run E2E Tests
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
GOOSE_DISABLE_KEYRING: 1
run: source bin/activate-hermit && just e2e

smoke-tests:
name: Smoke Tests
runs-on: ubuntu-latest
Expand Down
9 changes: 9 additions & 0 deletions Justfile
Original file line number Diff line number Diff line change
Expand Up @@ -443,3 +443,12 @@ build-test-tools:
record-mcp-tests: build-test-tools
GOOSE_RECORD_MCP=1 cargo test --package goose --test mcp_integration_test
git add crates/goose/tests/mcp_replays/

e2e:
@echo "Building goosed..."
cargo build --bin goosed
@just copy-binary debug
@echo "Generating API types..."
cd ui/desktop && pnpm run generate-api
@echo "Running E2E tests..."
bash ui/desktop/tests/e2e-tests/scripts/e2e-run-all.sh
2 changes: 2 additions & 0 deletions ui/desktop/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,5 @@ src/bin/goose-npm/
src/bin/temporal.db
# Signing credentials
.env.signing

tests/e2e-tests/screenshots/
4 changes: 3 additions & 1 deletion ui/desktop/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,7 @@
"@vitejs/plugin-react": "^5.1.4",
"@vitest/coverage-v8": "^4.0.18",
"@vitest/ui": "^4.0.18",
"agent-browser": "^0.20.14",
"autoprefixer": "^10.4.24",
"electron": "41.0.0",
"electron-devtools-installer": "^4.0.0",
Expand Down Expand Up @@ -159,7 +160,8 @@
"@modelcontextprotocol/ext-apps",
"electron",
"electron-winstaller",
"esbuild"
"esbuild",
"agent-browser"
]
},
"lint-staged": {
Expand Down
17 changes: 16 additions & 1 deletion ui/desktop/pnpm-lock.yaml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions ui/desktop/src/components/ChatInput.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -1367,6 +1367,7 @@ export default function ChatInput({
size="sm"
shape="round"
variant="outline"
data-testid="send-button"
disabled={isSubmitButtonDisabled}
className={`rounded-full px-10 py-2 flex items-center gap-2 ${
isSubmitButtonDisabled
Expand Down Expand Up @@ -1593,6 +1594,7 @@ export default function ChatInput({
variant="ghost"
size="sm"
className="flex items-center justify-center text-text-primary/70 hover:text-text-primary text-xs cursor-pointer"
data-testid="recipe-action-button"
>
<ChefHat size={16} />
</Button>
Expand Down
1 change: 1 addition & 0 deletions ui/desktop/src/components/ChatSessionsContainer.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ export default function ChatSessionsContainer({
key={session.sessionId}
className={`absolute inset-0 ${isVisible ? 'block' : 'hidden'}`}
data-session-id={session.sessionId}
data-active-session={isVisible}
>
<BaseChat
setChat={setChat}
Expand Down
Loading