Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -175,6 +175,26 @@ If you touched `ansible/`, also follow <ansible/AGENTS.md>.

`plans/`: for future work or work in progress. Once a plan is fully completed, remove it from `plans/` (delete, or squash into short tombstone/summary elsewhere).

### SPEC.md — High-level component specifications

`<subproject>/SPEC.md`: high-level, user-facing specification of what a
component guarantees to its users. An outside observer should be able to read
SPEC.md to understand what behaviors they can rely on, without having to read
the implementation. Example: <devinfra/claude/hook_daemon/SPEC.md> describes
what the Claude Code hook daemon provides to every session, and the
`/web_selfcheck` skill runs the acceptance tests derived from it.

SPEC.md files **must** be updated when the high-level requirements of the
thing they cover change — a new class of credential gets injected, a new
shim behavior is added, a new profile lands, a new promise is made to the
agent, etc.

SPEC.md files **must not** record low-level implementation details that an
outside observer would not notice. "Credentials are refreshed regularly by
the backend service" belongs in SPEC.md; "credentials live in
`<session_dir>/creds.json` and rotate every 300s via RPC to
`rotate.example.com`" does not — that belongs in README.md or in the code.

### TODO Tracking

Subprojects use `TODO.md` for persistent TODO tracking. TODOs local to a specific code location are fine as inline comments; cross-cutting or project-level TODOs belong in `TODO.md`.
Expand Down
7 changes: 7 additions & 0 deletions devinfra/claude/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,13 @@ By preserving the original proxy env vars:
- JWT token refreshes are automatically picked up
- The bazelisk shim sends fresh credentials to the daemon on each invocation

## Specification

See <hook_daemon/SPEC.md> for the high-level, user-facing specification of
what the hook daemon guarantees to every Claude Code session (on CLI and on
web). Read that first if you want to know **what** the daemon does for the
agent — this README covers **how** those behaviors are implemented.

## Components

- **Session Start Hook**: Sets up the development environment for Claude Code web sessions
Expand Down
275 changes: 275 additions & 0 deletions devinfra/claude/hook_daemon/SPEC.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,275 @@
# Hook Daemon Specification

See @README.md for architectural and implementation details.

## Overview

Every Claude Code session — whether it is running in Claude Code CLI on a
developer workstation or in Claude Code on the web inside a sandboxed container
— is paired with a **session-scoped hook daemon**. The daemon is launched by
Claude Code's `SessionStart` hook and lives for the duration of the session.

Its job is to make every session look the same to the agent:

- Bazel (via `bbr` / `bazelisk` / `bb`) is wired up to BuildBuddy and works
out of the box: plain `bazelisk build <target>` or `bb build <target>`
automatically uses BuildBuddy remote execution and remote cache, with no
extra flags from the agent.
- Credentials the agent needs (BuildBuddy, GitHub, Kubernetes, tracing) are
available in the environment without the agent having to fetch or decrypt
them.
- Dangerous or footgun git operations are blocked by a PATH shim.
- Pre-commit lint/format hooks run automatically on Edit/Write and their
failures are reported back to the agent.
- The `claude-sandbox-kubectl` MCP server is configured to talk to the
cluster as the expected Claude identity.
- Hook activity is traced to the central OpenTelemetry collector.

The daemon exposes two **profiles** — `cli` and `web` — that differ both in
what the surrounding environment is expected to provide and in which
behaviors are enabled (e.g., the git safety shim and direnv bridge are
CLI-only; egress proxy handling, mkcert, tmpfs, managed credentials, and
idle shutdown are web-only).

## Common Behaviors (CLI and Web)

These guarantees hold in every session, regardless of profile.

### Credentials in the agent's environment

Every Bash tool call sees:

- A valid `BUILDBUDDY_API_KEY`.
- A valid `GITHUB_TOKEN` (on web this is the `agentydragon-agent` machine
user; on CLI this is whatever token the user's outer shell already
exposes).
- `DUCKTAPE_OTEL_BEARER_TOKEN` for tracing.

The agent should never need to decrypt SOPS files or run `gh auth login`
manually — if a credential is missing, the daemon is broken.

### Bazel / BuildBuddy

- `bazelisk`, `bb`, and `bbr` on `PATH` are wired to BuildBuddy.
- A plain `bazelisk build <target>` or `bb build <target>` automatically
uses BuildBuddy remote execution **and** remote cache out of the box.
The agent does not need to pass `--config=rbe`, `--remote_executor=...`,
`--remote_cache=...`, or any authentication flags.
- BuildBuddy invocations are tagged with the session ID so they can be
filtered later via `bbapi invocation list --tag session:<id>`.
- **`bbr` preserves the Bazel analysis cache across invocations, at least
mostly.** Running `bbr` a second time with the same inputs should
usually land on a warm BuildBuddy runner that has the analysis cache
already populated, so the second build is substantially faster than a
cold one. This is best-effort, not a hard guarantee — today runners are
shared across all concurrent sessions and may be evicted or rotated,
so an occasional cold hit is acceptable. A session where _every_ `bbr`
call is cold is broken.

### Pre-commit lint & format on Edit/Write

When the agent edits a file via the Edit or Write tool, the daemon runs the
project's `pre-commit` configuration against the touched files as a
`PostToolUse` hook:

- Pure format/whitespace hooks (e.g. `ruff-format`) are **auto-applied**
and the fixed file is kept. See the profile YAMLs under <profiles/> for
the full auto-apply list.
- Any other hook that fails blocks the edit: changes made by that hook are
reverted and the failure is reported back to Claude as a `PostToolUse`
block, so Claude can fix and retry.

### OpenTelemetry tracing

- Every hook invocation (SessionStart, PreToolUse, PostToolUse, background
tasks) is traced to the central OTLP collector with a bearer token.
- Traces are keyed by session ID so they can be retrieved per session for
debugging.

### MCP servers

- The `claude-sandbox-kubectl` MCP server is configured and authenticated so
that `kubectl`-equivalent calls act as the cluster's designated Claude
identity (see <../../../cluster/k8s/agents/claude-rbac/>). The agent
should always prefer it over raw `Bash(kubectl ...)` for `claude-sandbox`
operations.

### Observability

- Hook daemon logs are available on disk under the session directory for the
duration of the session (exact path documented in <README.md>).
- A session context banner surfaces warnings from setup and background tasks
to the agent at SessionStart.

## CLI Profile

The CLI profile targets a developer workstation where the user is already
logged in and has a `nix`/`direnv`-managed devshell. The daemon therefore
relies on the outer environment for most credentials and focuses on safety
rails.

### What the surrounding environment provides

- **Credentials come from `.envrc`** (via `direnv`), which sources the
repo's encrypted CLI env script. `BUILDBUDDY_API_KEY`, `GITHUB_TOKEN`, and
`DUCKTAPE_OTEL_BEARER_TOKEN` are expected to already be in the process
environment when Claude Code launches. They reflect the **user's own**
identity (the developer's GitHub PAT, the user's own BuildBuddy key).
- **Kubeconfig comes from `~/.kube/config`** — the user's personal cluster
access. The daemon does not write its own kubeconfig; MCP and `kubectl`
use whatever the user has.
- **The devshell provides `bazelisk`, `bb`, `sops`, `gh`, etc.** on PATH via
Nix home-manager.

The daemon's job is to propagate those env vars into every Bash tool call
(since Claude Code's Bash tool does not automatically run through direnv) and
to layer the shims on top.

### CLI-specific guarantees

- **Git safety shim.** A `git` wrapper on PATH blocks footgun commands:
- `git commit --amend` (prevents rewriting shared history)
- `git add -A` / `git add .` (forces explicit file listing)
- `git stash` (prevents accidental stash-and-forget)

Blocked commands exit non-zero with a clear error and are never run.
Read-only operations (`git stash list`, `git stash show`) are allowed.

- **direnv bridge.** Every Bash tool call sees the env exported by the
nearest `.envrc`, so `cd`-ing between subprojects picks up the right
devshell environment.

### What CLI does NOT do

- Does not configure an egress proxy.
- Does not set up tmpfs, mkcert, docker, or supervisor.
- Does not write a kubeconfig — the user provides one.
- Does not idle-shutdown.

## Web Profile

The Web profile targets Claude Code on the web, running inside a sandboxed
container with TLS-inspecting network egress. The surrounding environment
provides almost nothing beyond a SOPS age key and the agent's container
identity; the daemon is responsible for standing up everything else.

### What the surrounding environment provides

- The **`web_setup.sh`** bootstrap script has already run and installed Nix,
devtools, skills, and a `settings.local.json` containing secrets needed by
MCP servers.
- A **SOPS age key** (`SOPS_AGE_KEY`) that can decrypt the repo's
`claude-web` secrets.
- A **TLS-inspecting egress proxy** via `HTTPS_PROXY` / `HTTP_PROXY` with a
periodically-refreshed JWT. The pre-installed TLS inspection CA is present
on the container filesystem.

### Web-specific guarantees

- **Managed credentials.** The daemon decrypts SOPS secrets at startup and
injects them into the agent's environment:
- `BUILDBUDDY_API_KEY` — shared BuildBuddy key for the claude-web identity.
- `GITHUB_TOKEN` — the **`agentydragon-agent` machine user PAT**, not a
personal token. The agent commits and pushes as that identity.
- `DUCKTAPE_OTEL_BEARER_TOKEN` — for tracing.
- Kubernetes service account token (see below).

Credentials that the cluster rotates (e.g., the k8s service account token)
are refreshed regularly so that long-running sessions keep working. The
agent should never see a stale token as a session drags on.

- **Kubernetes access as `claude-code-web` ServiceAccount.** The daemon
writes a kubeconfig pointing at the cluster API, authenticated as the
`claude-code-web` ServiceAccount. `KUBECONFIG` is exported into the
agent's environment, and the `claude-sandbox-kubectl` MCP server uses the
same identity. Both `kubectl` and MCP calls land with the RBAC documented
in <../../../AGENTS.md>.

- **GitHub fork remote.** If the machine user has a fork of the repo, the
daemon configures it as a `fork` remote with push credentials, so that
`git push -u fork <branch>` works without further setup.

- **Network to BuildBuddy works out of the box.** `bazelisk`, `bb`, and
`bbr` reach BuildBuddy and the Bazel Central Registry successfully on
the first invocation. The agent never has to configure CA bundles,
truststores, proxy env vars, or `--remote_proxy` flags to get builds
working over the container's constrained egress.

- **Container runtime.** Docker, supervisor, and mkcert are set up so that
integration tests that need a local container runtime work.

- **Tmpfs caching.** Performance-sensitive caches (Bazel output base, Docker
storage when the container root is slow) are backed by tmpfs. From the
agent's perspective this is invisible except that Bazel is not absurdly
slow.

- **Idle shutdown.** The daemon auto-exits after a period of inactivity so
stale containers don't accumulate.

### What Web does NOT do

- Does **not** install the git safety shim. (Web sessions push to a fork,
not to `devel`, so `git amend`/`add -A` are less dangerous. If this ever
changes, update this file.)

## Observable Acceptance Criteria

These are the checks that the `/web_selfcheck` skill effectively runs as an
acceptance test against a live session. A healthy session satisfies all
applicable criteria for its profile.

### Common

1. `echo $BUILDBUDDY_API_KEY` is non-empty, and a GetUser RPC against
`remote.buildbuddy.io` authenticates successfully.
2. `echo $GITHUB_TOKEN` is non-empty, and `GET https://api.github.com/user`
returns the expected login (`agentydragon-agent` on web, the developer's
own login on CLI).
3. `bbr build <trivial target> --nobuild` succeeds without TLS or proxy
errors.
4. `bazelisk` on PATH points at the daemon's shim, and invocations are
tagged with `session:<id>` in BuildBuddy.
5. Editing a Python file via Write/Edit triggers `ruff-format`
(auto-applied) and, on a lint violation, the edit is blocked with a
clear reason.
6. Pre-commit runs end to end: a throwaway commit on a scratch branch
passes all hooks.
7. Hook daemon logs are present and contain no unhandled exceptions from
SessionStart.
8. Tracing reaches the OTLP collector (the bearer token test returns a
non-auth-error status).
9. Running `bbr build <target>` twice in a row with identical inputs
lands on a warm runner the second time: the second invocation's
analysis phase is substantially faster than the first (rule of thumb:
warm < cold / 3). This is best-effort; a single cold-hit failure can
be transient (runner rotation, cache eviction), but consistent
cold-every-time across repeated runs is a daemon bug.

### CLI only

1. `git commit --amend` fails with a `[git-shim] BLOCKED` error.
2. `git add -A` / `git add .` fails with a `[git-shim] BLOCKED` error.
3. `git stash` (without `list`/`show`) fails with a `[git-shim] BLOCKED`
error.
4. A `cd` into a subproject with its own `.envrc` propagates the expected
env vars into the next Bash tool call.

### Web only

1. `kubectl get pods -n claude-sandbox` works, authenticated as
`claude-code-web`. The `claude-sandbox-kubectl` MCP server returns the
same pod list.
2. `$GITHUB_TOKEN` resolves to the `agentydragon-agent` machine user (not a
personal account).
3. `git remote -v` shows a `fork` remote with push access to the machine
user's fork.
4. `bbr build <any target>` works out of the box. No extra flags, no
manual `git remote` setup, no prompt asking the user to pick a remote —
the default remote is selected automatically, and the build reaches a
BuildBuddy runner on the first try.
5. Docker is available (`docker info` succeeds) for tests that need a local
container runtime.

Anything that fails these criteria is a daemon bug, not a user problem. The
`/web_selfcheck` skill is the canonical runnable acceptance test for this
spec.
4 changes: 4 additions & 0 deletions devinfra/claude/hook_daemon/bes_interceptor.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,10 @@

If a build/test invocation lacks --remote_executor, a mailbox message is posted
to the session nudging the agent toward `bb remote`.

TODO: this nudge behavior is experimental and deliberately NOT in SPEC.md yet.
If it proves reliable and useful, promote it to a committed behavior under
"Common Behaviors" in <SPEC.md>.
"""

from __future__ import annotations
Expand Down
Loading
Loading