Skip to content
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,10 @@
# Development (optional)
# AGENT_VAULT_DEV_MODE=false # when true, allows internal/localhost hosts in proposals

# Sandbox mode for `agent-vault vault run` (optional)
# process (default, cooperative) | container (non-cooperative Docker sandbox with iptables egress lock)
# AGENT_VAULT_SANDBOX=process

# Observability (optional)
# AGENT_VAULT_LOG_LEVEL=info # info (default) | debug — debug emits one line per proxied request (no secret values)

Expand Down
1 change: 1 addition & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ make docker # Multi-stage Docker image; data persisted at /data/.agent-vau
- Vault role: `proxy` < `member` < `admin`. Proxy can use the proxy and raise proposals; member can manage credentials/services; admin can invite humans.
- **KEK/DEK key wrapping**: A random DEK (Data Encryption Key) encrypts credentials and the CA key at rest (AES-256-GCM). If a master password is set, Argon2id derives a KEK (Key Encryption Key) that wraps the DEK; changing the password re-wraps the DEK without re-encrypting credentials. If no password is set (passwordless mode), the DEK is stored in plaintext — suitable for PaaS deploys where volume security is the trust boundary. Login uses email+password or Google OAuth. The first user to register becomes the instance owner and is auto-granted vault admin on `default`.
- **Agent skills are the agent-facing contract.** [cmd/skill_cli.md](cmd/skill_cli.md) and [cmd/skill_http.md](cmd/skill_http.md) are embedded into the binary, installed by `vault run`, and served publicly at `/v1/skills/{cli,http}`. They are the authoritative reference for what agents can do.
- **Two sandbox modes for `vault run`** (selected via `--sandbox` or `AGENT_VAULT_SANDBOX`): `process` (default, cooperative — fork+exec with `HTTPS_PROXY` envvars) and `container` (non-cooperative — Docker container with iptables egress locked to the Agent Vault proxy). Container mode lives in [internal/sandbox/](internal/sandbox/) with an embedded Dockerfile + init-firewall.sh + entrypoint.sh, built on first use and cached by content hash.

## Where to look for details

Expand Down
8 changes: 8 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,14 @@ agent-vault vault run -- claude

The agent calls APIs normally (e.g. `fetch("https://api.github.com/...")`). Agent Vault intercepts the request, injects the credential, and forwards it upstream. The agent never sees secrets.

For **non-cooperative** sandboxing — where the child physically cannot reach anything except the Agent Vault proxy, regardless of what it tries — launch it in a Docker container with egress locked down by iptables:

```bash
agent-vault vault run --sandbox=container -- claude
```

See [Container sandbox](https://docs.agent-vault.dev/guides/container-sandbox) for the threat model and flags.

### SDK — sandboxed agents (Docker, Daytona, E2B)

For agents running inside containers, use the SDK from your orchestrator to mint a session and pass proxy config into the sandbox:
Expand Down
94 changes: 55 additions & 39 deletions cmd/run.go
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ import (
"strings"
"syscall"

"github.com/Infisical/agent-vault/internal/sandbox"
"github.com/Infisical/agent-vault/internal/session"
"github.com/Infisical/agent-vault/internal/store"
"github.com/charmbracelet/huh"
Expand All @@ -25,6 +26,10 @@ var skillCLI string
//go:embed skill_http.md
var skillHTTP string

// sandboxMode is enum-typed so `--sandbox=foo` fails at flag-parse time
// with the allowed set, rather than deep inside RunE.
var sandboxMode SandboxMode

var runCmd = &cobra.Command{
Use: "run [flags] -- <command> [args...]",
Short: "Wrap an agent process with Agent Vault access",
Expand Down Expand Up @@ -53,6 +58,24 @@ Example:
Args: cobra.MinimumNArgs(1),
DisableFlagsInUseLine: true,
RunE: func(cmd *cobra.Command, args []string) error {
// 0. Resolve sandbox mode and validate flag compatibility before any
// network I/O — the user sees conflicts immediately, not after
// a slow session-mint round-trip.
mode := sandboxMode
if mode == "" {
if v := os.Getenv("AGENT_VAULT_SANDBOX"); v != "" {
if err := mode.Set(v); err != nil {
return fmt.Errorf("AGENT_VAULT_SANDBOX: %w", err)
}
}
}
if mode == "" {
mode = SandboxProcess
}
if err := validateSandboxFlagConflicts(cmd, mode); err != nil {
return err
}

// 1. Load the admin session from agent-vault auth login.
sess, err := ensureSession()
if err != nil {
Expand All @@ -78,6 +101,10 @@ Example:
return err
}

if mode == SandboxContainer {
return runContainer(cmd, args, scopedToken, addr, vault)
}

// 4. Resolve the target binary.
binary, err := exec.LookPath(args[0])
if err != nil {
Expand Down Expand Up @@ -276,23 +303,19 @@ func fetchUserVaults(addr, token string) ([]string, error) {
return names, nil
}

// mitmInjectedKeys is the set of env keys augmentEnvWithMITM manages on
// the child. Any pre-existing occurrence inherited from os.Environ() must
// be stripped before the new values are appended — POSIX getenv returns
// the *first* match in C code paths (glibc, curl, libcurl-backed Python),
// so a stale corporate HTTPS_PROXY from the parent shell would otherwise
// silently win and the MITM route would be bypassed entirely.
var mitmInjectedKeys = map[string]struct{}{
"HTTPS_PROXY": {},
"NO_PROXY": {},
"NODE_USE_ENV_PROXY": {},
"SSL_CERT_FILE": {},
"NODE_EXTRA_CA_CERTS": {},
"REQUESTS_CA_BUNDLE": {},
"CURL_CA_BUNDLE": {},
"GIT_SSL_CAINFO": {},
"DENO_CERT": {},
}
// mitmInjectedKeys is the keyset that BuildProxyEnv emits. Any
// pre-existing occurrence inherited from os.Environ() must be stripped
// before the new values are appended — POSIX getenv returns the *first*
// match in C code paths (glibc, curl, libcurl-backed Python), so a stale
// corporate HTTPS_PROXY from the parent shell would otherwise silently
// win and the MITM route would be bypassed entirely.
var mitmInjectedKeys = func() map[string]struct{} {
m := make(map[string]struct{}, len(sandbox.ProxyEnvKeys))
for _, k := range sandbox.ProxyEnvKeys {
m[k] = struct{}{}
}
return m
}()

// stripEnvKeys returns env with every entry whose key (the part before
// '=') appears in keys removed. Case-sensitive, matching how the kernel
Expand Down Expand Up @@ -359,30 +382,16 @@ func augmentEnvWithMITM(env []string, addr, token, vault, caPath string) ([]stri
mitmHost = h
}
}
scheme := "http"
if mitmTLS {
scheme = "https"
}
proxyURL := (&url.URL{
Scheme: scheme,
User: url.UserPassword(token, vault),
Host: fmt.Sprintf("%s:%d", mitmHost, port),
}).String()

env = stripEnvKeys(env, mitmInjectedKeys)
// CA trust variables must stay in sync with buildProxyEnv() in
// sdks/sdk-typescript/src/resources/sessions.ts.
env = append(env,
"HTTPS_PROXY="+proxyURL,
"NO_PROXY=localhost,127.0.0.1",
"NODE_USE_ENV_PROXY=1",
"SSL_CERT_FILE="+caPath,
"NODE_EXTRA_CA_CERTS="+caPath,
"REQUESTS_CA_BUNDLE="+caPath,
"CURL_CA_BUNDLE="+caPath,
"GIT_SSL_CAINFO="+caPath,
"DENO_CERT="+caPath,
)
env = append(env, sandbox.BuildProxyEnv(sandbox.ProxyEnvParams{
Host: mitmHost,
Port: port,
Token: token,
Vault: vault,
CAPath: caPath,
MITMTLS: mitmTLS,
})...)
return env, port, true, nil
}

Expand Down Expand Up @@ -440,5 +449,12 @@ func init() {
runCmd.Flags().Int("ttl", 0, "Session TTL in seconds (300–604800; default: server default 24h)")
runCmd.Flags().Bool("no-mitm", false, "Skip HTTPS_PROXY/CA env injection for the child (explicit /proxy only)")

runCmd.Flags().Var(&sandboxMode, "sandbox", "Sandbox mode: process (default) or container")
runCmd.Flags().String("image", "", "Container image override (requires --sandbox=container)")
runCmd.Flags().StringArray("mount", nil, "Extra bind mount src:dst[:ro] (repeatable; requires --sandbox=container)")
runCmd.Flags().Bool("keep", false, "Don't pass --rm to docker (requires --sandbox=container)")
runCmd.Flags().Bool("no-firewall", false, "Skip iptables egress rules inside the container (requires --sandbox=container; debug only)")
runCmd.Flags().Bool("home-volume-shared", false, "Share /home/claude/.claude across invocations (requires --sandbox=container); default is a per-invocation volume, losing auth state but avoiding concurrency corruption")

vaultCmd.AddCommand(runCmd)
}
171 changes: 171 additions & 0 deletions cmd/run_container.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
package cmd

import (
"context"
"errors"
"fmt"
"net/url"
"os"
"os/exec"
"runtime"
"strconv"
"syscall"
"time"

"github.com/spf13/cobra"
"golang.org/x/term"

"github.com/Infisical/agent-vault/internal/sandbox"
)

// containerOnlyFlags are no-ops in process mode; we reject them explicitly
// rather than silently ignoring them, which would be a foot-gun.
var containerOnlyFlags = []string{"image", "mount", "keep", "no-firewall", "home-volume-shared"}

func validateSandboxFlagConflicts(cmd *cobra.Command, mode SandboxMode) error {
if mode == SandboxContainer {
return nil
}
for _, name := range containerOnlyFlags {
f := cmd.Flags().Lookup(name)
if f == nil {
continue
}
if f.Changed {
return fmt.Errorf("--%s requires --sandbox=container", name)
}
}
return nil
Comment on lines +31 to +48
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The --no-mitm flag is silently accepted and ignored when --sandbox=container is used, even though container mode always routes through MITM and cannot bypass it. This is inconsistent with the explicit design principle stated in the adjacent code: container-only flags are rejected rather than silently ignored in process mode "rather than silently ignoring them, which would be a foot-gun" — the same principle should apply symmetrically for process-only flags in container mode.

Extended reasoning...

What the bug is and how it manifests

validateSandboxFlagConflicts (cmd/run_container.go:25-38) returns nil immediately when mode == SandboxContainer, skipping any validation of process-only flags. As a result, agent-vault vault run --no-mitm --sandbox=container -- claude accepts the flag without error or warning, even though --no-mitm has absolutely zero effect in container mode — the container path always calls fetchMITMCA and always routes all traffic through the MITM proxy.

The specific code path that triggers it

When the user passes --no-mitm --sandbox=container, validateSandboxFlagConflicts is called at cmd/run.go:75, but returns immediately at the if mode == SandboxContainer { return nil } branch on run_container.go:27. runContainer never calls cmd.Flags().GetBool("no-mitm") at any point — the flag is simply never consulted. The code even documents this on run_container.go:62: "Container mode always routes through MITM — --no-mitm is a process-mode-only escape hatch."

Why existing code doesn't prevent it

The containerOnlyFlags list only enumerates flags that are container-only (image, mount, keep, no-firewall, home-volume-shared). There is no corresponding list of process-only flags (like --no-mitm) that should be rejected in container mode. The validation function is asymmetric by construction.

Why the design principle demands symmetry

The comment on lines 21-23 explicitly states the governing rule: "containerOnlyFlags are no-ops in process mode; we reject them explicitly rather than silently ignoring them, which would be a foot-gun." This exact principle applies in reverse: --no-mitm is a no-op in container mode. A user who passes it may reasonably believe the MITM proxy is bypassed — particularly because --no-mitm is a meaningful, effective escape hatch in process mode (it disables all HTTPS_PROXY injection entirely).

Impact

No security regression: container mode enforces MITM at the iptables level regardless of what flags are passed, so the MITM is never actually bypassed. The impact is purely UX/correctness — a user who passes --no-mitm --sandbox=container gets no feedback that their flag is a no-op, which contradicts the stated design principle and could mislead the operator about the sandbox's actual network behavior.

Step-by-step proof

  1. User runs: agent-vault vault run --no-mitm --sandbox=container -- claude
  2. RunE resolves mode = SandboxContainer and calls validateSandboxFlagConflicts(cmd, SandboxContainer)
  3. validateSandboxFlagConflicts hits line 27: if mode == SandboxContainer { return nil } — returns immediately with no error
  4. RunE proceeds to runContainer(cmd, args, ...) (cmd/run.go:101-103)
  5. runContainer calls fetchMITMCA unconditionally (line ~60) and routes all HTTPS through MITM — --no-mitm is never read
  6. User believes MITM is disabled; MITM is fully active

How to fix

Add a processOnlyFlags list (e.g. ["no-mitm"]) and check it symmetrically inside validateSandboxFlagConflicts when mode == SandboxContainer, returning an error such as "--no-mitm is not supported in container mode (MITM is always active)". Alternatively, emit a fmt.Fprintf(os.Stderr, "warning: --no-mitm has no effect in container mode") instead of a hard error, matching the pattern used by --no-firewall.

}

// runContainer launches the target agent inside a Docker container with
// egress locked to the agent-vault proxy via iptables.
func runContainer(cmd *cobra.Command, args []string, scopedToken, addr, vault string) error {
if runtime.GOOS != "linux" && runtime.GOOS != "darwin" {
return fmt.Errorf("--sandbox=container: only linux and darwin are supported in v1 (got %s)", runtime.GOOS)
}
if _, err := exec.LookPath("docker"); err != nil {
return errors.New("--sandbox=container: `docker` not found in PATH")
}

ctx := cmd.Context()
if ctx == nil {
ctx = context.Background()
}

// Housekeeping: trim old CA tempfiles and networks from crashed runs
// before we create new ones. Both are best-effort.
sandbox.PruneHostCAFiles()
_ = sandbox.PruneStaleNetworks(ctx, sandbox.DefaultPruneGrace)

// Pull the MITM CA from the server. Container mode always routes
// through MITM — --no-mitm is a process-mode-only escape hatch.
pem, mitmPort, mitmEnabled, mitmTLS, err := fetchMITMCA(addr)
if err != nil {
return fmt.Errorf("fetch MITM CA: %w", err)
}
if !mitmEnabled {
return errors.New("--sandbox=container requires the MITM proxy; server has it disabled")
}
if mitmPort == 0 {
mitmPort = DefaultMITMPort
}

// Upstream agent-vault HTTP port for the forwarder. Parsed from
// --address / session address, with DefaultPort as a fallback.
upstreamHTTPPort := DefaultPort
if u, perr := url.Parse(addr); perr == nil {
if p, cerr := strconv.Atoi(u.Port()); cerr == nil && p > 0 {
upstreamHTTPPort = p
}
}

sessionID, err := sandbox.NewSessionID()
if err != nil {
return err
}

hostCAPath, err := sandbox.WriteHostCAFile(pem, sessionID)
if err != nil {
return fmt.Errorf("write CA: %w", err)
}

network, err := sandbox.CreatePerInvocationNetwork(ctx, sessionID)
if err != nil {
return fmt.Errorf("create docker network: %w", err)
}
defer func() {
// Only runs on error arms — syscall.Exec below replaces the
// process, bypassing defers. Use a detached context so a
// parent ctx cancel doesn't skip the cleanup exec itself.
cleanup, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
_ = sandbox.RemoveNetwork(cleanup, network.Name)
}()

bindIP := sandbox.HostBindIP(network)
if bindIP == nil {
return errors.New("could not determine host bind IP for forwarder")
}

fwd, err := sandbox.StartForwarder(ctx, bindIP, upstreamHTTPPort, mitmPort)
if err != nil {
return fmt.Errorf("start forwarder: %w", err)
}
defer func() { _ = fwd.Close() }()

image, _ := cmd.Flags().GetString("image")
imageRef, err := sandbox.EnsureImage(ctx, image, os.Stderr)
if err != nil {
return err
}

workDir, err := os.Getwd()
if err != nil {
return fmt.Errorf("getwd: %w", err)
}

Comment on lines +146 to +150
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 The current working directory is bind-mounted read-write at /workspace without the same ~/.agent-vault protection applied to user-supplied --mount flags. If a user runs vault run --sandbox=container while CWD is inside ~/.agent-vault (or a symlink resolving there), the vault directory — containing the encrypted credential database, the MITM CA private key, and session tokens — is exposed read-write to the container. The fix is to call validateHostSrc(workDir, home) on the os.Getwd() result before passing it to BuildRunArgs, mirroring the protection already applied to --mount entries.

Extended reasoning...

The bug

In cmd/run_container.go (lines 123–138), workDir is obtained via os.Getwd() and passed directly as Config.WorkDir to sandbox.BuildRunArgs. Inside BuildRunArgs (docker.go:103), it is added unconditionally as -v cfg.WorkDir:/workspace with no path validation whatsoever.

Existing protection is asymmetric

User-supplied --mount values flow through parseAndValidateMount → validateHostSrc (docker.go:152–175), which calls filepath.EvalSymlinks to resolve symlinks and then checks whether the resolved path equals or is nested under ~/.agent-vault. The workDir path skips this check entirely. The protection intent is explicit and deliberate in the --mount path; its absence on the workspace mount is an oversight.

Step-by-step proof

  1. Developer has agent-vault installed locally with a vault at ~/.agent-vault/.
  2. Developer navigates: cd ~/.agent-vault
  3. Developer runs: agent-vault vault run --sandbox=container -- claude
  4. os.Getwd() returns /home/user/.agent-vault
  5. BuildRunArgs appends -v /home/user/.agent-vault:/workspace to the docker argv
  6. The container starts with the entire vault directory mounted read-write at /workspace
  7. The container agent can now: read ca/ca.pem (MITM CA private key used to sign TLS leaves for all intercepted HTTPS traffic), read vault.db (credential database), and overwrite any of these files (key replacement attack)

The same scenario triggers if CWD is any subdirectory of ~/.agent-vault, or if a symlink in a normal-looking path resolves to somewhere under ~/.agent-vault.

Impact

In passwordless mode (DEK stored in plaintext — the documented default for local/PaaS use), the CA private key stored under ~/.agent-vault/ca/ is directly readable and the database is decryptable. In password-protected mode, write access allows an attacker to replace the CA key so future MITM intercepts use an attacker-controlled key. Both scenarios violate the core sandbox guarantee. The iptables egress lock is not relevant here — the attacker reads/writes the host filesystem via the bind mount, not the network.

Fix

Before passing workDir to BuildRunArgs, call the already-existing validateHostSrc (or an exported wrapper) with the os.UserHomeDir() result. This brings the workspace mount in line with the protection already applied to user-supplied mounts. Alternatively, BuildRunArgs itself could apply the check when cfg.WorkDir is set, since it already calls os.UserHomeDir() for the --mount path.

env := sandbox.BuildContainerEnv(scopedToken, vault, fwd.HTTPPort, fwd.MITMPort, mitmTLS)

mounts, _ := cmd.Flags().GetStringArray("mount")
keep, _ := cmd.Flags().GetBool("keep")
noFirewall, _ := cmd.Flags().GetBool("no-firewall")
homeShared, _ := cmd.Flags().GetBool("home-volume-shared")

dockerArgs, err := sandbox.BuildRunArgs(sandbox.Config{
ImageRef: imageRef,
SessionID: sessionID,
WorkDir: workDir,
HostCAPath: hostCAPath,
NetworkName: network.Name,
AttachTTY: term.IsTerminal(int(os.Stdin.Fd())),
Keep: keep,
NoFirewall: noFirewall,
HomeVolumeShared: homeShared,
Mounts: mounts,
Env: env,
CommandArgs: args,
})
if err != nil {
return err
}

dockerBin, err := exec.LookPath("docker")
if err != nil {
return err
}

if noFirewall {
fmt.Fprintln(os.Stderr, "agent-vault: WARNING --no-firewall active, container egress is unrestricted")
}
fmt.Fprintf(os.Stderr, "%s routing container HTTPS through MITM on %s:%d (container view: host.docker.internal:%d)\n",
successText("agent-vault:"), bindIP, fwd.MITMPort, fwd.MITMPort)
fmt.Fprintf(os.Stderr, "%s starting %s in sandbox (%s)...\n\n",
successText("agent-vault:"), boldText(args[0]), network.Name)

// Exec docker directly so the controlling TTY, SIGINT, SIGWINCH
// propagate naturally. Listeners are FD_CLOEXEC so they close at
// exec; per-conn forwarder goroutines die with the replaced process
// image. On success this never returns.
return syscall.Exec(dockerBin, append([]string{"docker"}, dockerArgs...), os.Environ())
}
Loading