feat: add --sandbox=container mode to vault run#99
Conversation
Launches the agent inside a Docker container with iptables-locked egress, so the child physically cannot reach anything except the Agent Vault proxy — regardless of what it tries. Opt-in for now; --sandbox=process remains the default. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Preview deployment for your docs. Learn more about Mintlify Previews.
💡 Tip: Enable Workflows to automatically generate PRs for you. |
|
| GitGuardian id | GitGuardian status | Secret | Commit | Filename | |
|---|---|---|---|---|---|
| 30536913 | Triggered | Generic Password | eab09d8 | internal/sandbox/cacopy_test.go | View secret |
🛠 Guidelines to remediate hardcoded secrets
- Understand the implications of revoking this secret by investigating where it is used in your code.
- Replace and store your secret safely. Learn here the best practices.
- Revoke and rotate this secret.
- If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.
To avoid such incidents in the future consider
- following these best practices for managing and storing secrets including API keys and other credentials
- install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.
🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.
On macOS Docker Desktop, `getent hosts host.docker.internal` returns the AAAA record first, which init-firewall.sh then rejected as "not a plain IPv4 literal" and aborted the container. Our iptables rules are IPv4, so we need `getent ahostsv4` which only walks A records. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- --cap-drop=ALL strips CAP_SETUID/CAP_SETGID, so gosu in entrypoint.sh failed with EPERM when dropping root → claude. Re-add both caps; the non-root claude process still has an empty effective cap set after gosu, so the sandbox contract is unchanged. - Quiet gosec G302 on the CA-file 0o644 Chmod: the container's claude user has to read the bind mount and the parent dir is 0o700, so the host attack surface is unchanged. - Tighten the test-fixture WriteFile to 0o600 (G306) and wrap the forwarder's deferred Close in an explicit _ = (errcheck). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Binding on 127.0.0.1 relied on Docker Desktop's vpnkit routing host.docker.internal traffic to the host's lo0. On newer Docker Desktop builds (VZ / virtiofsd on Apple Silicon), that traffic is delivered to a different host interface, so the loopback listener never received the container's connection and the forwarder was unreachable — producing ECONNREFUSED on the HTTPS_PROXY path. Bind 0.0.0.0 instead to accept on whichever interface Desktop routes through. The broker still requires a vault-scoped session token on every request, so LAN reachability on an ephemeral port is not a meaningful attack surface. Linux path (bridge-gateway bind) is unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Listeners set SOCK_CLOEXEC by default in Go, so syscall.Exec("docker",
…) closed the forwarder sockets before the container even started.
Claude's HTTPS_PROXY calls then hit an empty port and the HTTP client
surfaced ECONNREFUSED against api.anthropic.com.
Replace the exec with fork+Wait: stdio is passed through, signals are
ignored in the parent so the kernel delivers them to docker (which
fans them out via --init/tini → claude), and the forwarder goroutines
stay live for the container's lifetime. Exit with the child's exit
code on non-zero. Defer-based network cleanup now actually runs on
the success path too.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
gocritic flagged os.Exit as bypassing the deferred signal.Stop and network teardown. Wrap the ExitError in a clear error string so Cobra prints it and exits 1 — losing the exact child exit code, but keeping network cleanup + signal-handler cleanup intact. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1. Validate the workspace (CWD) against the same reserved-host-path rules as user --mount. Running vault run --sandbox=container from inside ~/.agent-vault previously bind-mounted the encrypted CA key + vault database into the container read-write. 2. Lock down IPv6 egress in init-firewall.sh. iptables rules alone left the ip6tables OUTPUT chain at default ACCEPT, so on Docker daemons with IPv6 enabled the agent had unrestricted v6 egress. ip6tables now default-denies; we resolve host.docker.internal via ahostsv4 so v4 is the only path we need. 3. Clean up per-invocation agent-vault-claude-home-<sid> volumes. Docker's --rm removes the container but not named volumes, so previously one claude-home volume leaked per invocation. Add deferred RemoveVolume + startup PruneStaleVolumes (analogous to PruneStaleNetworks). The shared volume is excluded by name. 4. Reject --no-mitm in container mode symmetrically to how container-only flags are rejected in process mode. Container mode always routes through MITM — silently ignoring --no-mitm misled operators about the sandbox's network behavior. Asset hash updated (init-firewall.sh changed). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…iner Bind-mounts the host's ~/.claude + ~/.claude.json into the sandbox so the container reuses the user's existing Claude login instead of prompting to authenticate on every run. Mutually exclusive with --home-volume-shared. - Keychain bridge: on macOS the credential lives in Keychain, not on disk, so extract it once via `security find-generic-password` to the file Linux Claude reads inside the container. - UID remap: on Linux, pass HOST_UID/HOST_GID so entrypoint.sh remaps the baked-in claude user to the invoking user — writes to the bind mount land owned by the host user, not the container uid. - CAP_KILL: added so tini at PID 1 (UID 0) can forward TTY signals (SIGWINCH on resize, SIGINT on ^C) to a child running as a different UID. Without it `--cap-drop ALL` makes kill() across UIDs return EPERM and tini fatals on first terminal resize. - Same host-src validation as user --mount (reject ~/.agent-vault and the docker socket) applies to the bind, so a symlinked agent dir can't launder access to the encrypted vault data. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@claude review |
There was a problem hiding this comment.
Code review is billed via overage credits. To resume reviews, an organization admin can raise the monthly limit at claude.ai/admin-settings/claude-code.
Once credits are available, comment @claude review on this pull request to trigger a review.
|
@claude review |
… egress tests (#103) ## Summary Post-merge audit of the container-mode sandbox (PR #99 + follow-up `--share-agent-dir` work). Four must-fix items from the audit: - **Exit-code propagation**: container's real exit status now propagates to the parent via a new `ExitCodeError` sentinel unwrapped in `Execute()`. Previously every non-zero exit collapsed to `1`, which broke CI use cases like `vault run -- pytest`. Defers still run — the error returns normally through Cobra before `Execute()` unwraps and `os.Exit(Code)`s. - **Root-uid guard for `--share-agent-dir`**: reject `uid == 0` on Linux. The `usermod`/`groupmod` remap would hand the in-container `claude` user uid 0 alongside `NET_ADMIN`/`NET_RAW`/`SETUID`/`SETGID`/`KILL` caps, and `no-new-privileges` doesn't disarm ambient caps on root. - **Expanded `reservedContainerDsts`**: `/`, `/etc` (subtree), `ContainerClaudeConfig` (added by `--share-agent-dir` but never reserved), and both `/usr/local/sbin/{init-firewall,entrypoint}.sh`. Without `ContainerClaudeConfig` on the list, a user `--mount` could override the bind-mounted `~/.claude.json`. - **Egress-bypass integration tests**: `TestIntegration_EgressBlocked_Bypasses` covers the channels a compromised agent would actually try: IPv6 literal, UDP, ICMP, `curl --noproxy '*'`, and an env-stripped `HTTPS_PROXY` bypass. Shared `runInFirewalledContainer` helper also collapses the existing end-to-end test's duplicated docker argv. `iputils-ping` baked into the image so the ICMP probe doesn't `apt-get install` per test run (asset hash bumped). ## Test plan - [x] `make test` green (all unit tests) - [x] `go vet -tags docker_integration ./...` clean - [ ] `go test -tags docker_integration ./internal/sandbox/ -run Integration -v` on Linux + macOS (reviewer, please run — requires Docker) - [ ] Manual: `./agent-vault run --sandbox=container -- sh -c 'exit 42'` returns 42 (not 1) - [ ] Manual: `./agent-vault run --sandbox=container --mount /tmp/x:/home/claude/.claude.json -- true` rejects - [ ] Manual on Linux: `sudo ./agent-vault run --sandbox=container --share-agent-dir -- true` rejects with the new error 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
agent-vault vault run --sandbox=container -- <agent>launches the child inside a Docker container whose egress is locked down by iptables. Only the Agent Vault proxy is reachable — everything else is dropped at the kernel, closing the cooperative-sandbox escape hatches (unsettingHTTPS_PROXY, raw sockets, DNS exfil, subprocesses that don't inherit env).--sandbox=containerorAGENT_VAULT_SANDBOX=container;--sandbox=processstays default.host.docker.internal, and routing container traffic via a loopback-dial forwarder meansisLoopbackPeerstill exempts it from TierAuth rate limits.How it works
agent-vault-<sessionID>, labeled soPruneStaleNetworkscan reconcile on next run with a 60s grace window to avoid racing with freshly-created peers). Not the default bridge — sibling containers cannot reach the forwarder.14321+14322. Preserves the MITM's SNI-based leaf minting (client seeshost.docker.internal, matching leaf is minted on demand).--image.init-firewall.sh):OUTPUT DROPdefault;ACCEPTonly loopback, ESTABLISHED/RELATED, and the two forwarder ports athost.docker.internal. No DNS rule — resolved via/etc/hostsfrom--add-host=host-gateway, closing the DNS-exfil channel.claudeuser (gosu post-init).--cap-drop=ALL+--cap-add=NET_ADMIN,NET_RAW(only init-firewall uses them; non-root process post-gosu doesn't get them as ambient caps).--security-opt=no-new-privileges.Changes
CLI
vault run:--sandbox(enum, parse-time validated),--image,--mount(symlink-resolved, reserved-path rejection),--keep,--no-firewall,--home-volume-shared(cmd/sandbox_flag.go, cmd/run.go, cmd/run_container.go).syscall.Exec("docker", …)so TTY/signals propagate naturally (cmd/run_container.go).internal/sandbox/(new package)env.go— sharedBuildProxyEnv(now used by the process path in cmd/run.go too, eliminating the drift risk across sources that emit the 9 MITM env vars) + container-specificBuildContainerEnv.docker.go— pureBuildRunArgs+ mount validator withfilepath.EvalSymlinksdefense against symlink-laundering forbidden host paths.forwarder.go— context-cancellable two-port TCP relay; listeners are FD_CLOEXEC so they close cleanly when the caller execs docker.network.go—CreatePerInvocationNetwork+PruneStaleNetworkswith 60s grace window andlabel=agent-vault-sandbox=1 AND name=agent-vault-*double filter.gateway.go—HostBindIP(loopback on macOS/Windows, bridge gateway on Linux).cacopy.go— CA bind-mount at~/.agent-vault/sandbox/ca-<sid>.pem(0o644 via explicitChmod, parent 0o700). SessionID hex-regex validated so it can't traverse paths. 24h prune of stale files.image.go—EnsureImagewith content-hash tag caching (agent-vault/sandbox:<hash>), build-on-first-use via go:embed'd assets.assets/{Dockerfile,init-firewall.sh,entrypoint.sh}— embedded into the binary.Regression tests (cross-package invariants the sandbox depends on)
validateSNI(\"host.docker.internal\") == (false, nil)so tightening SNI validation without updating this test would silently break the container path.Docs
vault runtable.AGENT_VAULT_SANDBOX.host.docker.internalas the proxy host inside the container.Test plan
go build ./...cleango build -tags docker_integration ./...cleango test -race ./...all green (unit + sandbox-package tests)--sandboxrejects bogus values at flag-parse time:agent-vault vault run --sandbox=bogus -- claude→invalid argument \"bogus\" for \"--sandbox\" flag: must be one of: process, containeragent-vault vault run --sandbox=container -- claude --versionon Linux + macOS; confirm first-run image build succeeds and subsequent runs hit the cacheagent-vault vault run --sandbox=container -- bash -lc 'curl --max-time 3 https://1.1.1.1; echo exit=$?'→ non-zero exit (SYN dropped)curl -fsS https://api.github.com/zensucceeds via the brokergetent hosts google.comfails (no DNS rule)agent-vault vault run --sandbox=container -- whoamiprintsclaudedocker killa running container mid-session; nextvault runprunes the leaked networkgo test -tags docker_integration ./internal/sandbox/on a machine with docker running🤖 Generated with Claude Code