Skip to content

feat(sandbox): inject user env vars and secret refs into execution#12655

Open
anticorrelator wants to merge 6 commits intoversion-sandboxesfrom
dustin/sandbox-secrets
Open

feat(sandbox): inject user env vars and secret refs into execution#12655
anticorrelator wants to merge 6 commits intoversion-sandboxesfrom
dustin/sandbox-secrets

Conversation

@anticorrelator
Copy link
Copy Markdown
Contributor

@anticorrelator anticorrelator commented Apr 14, 2026

Adds user-defined environment variables to sandbox code execution and locks down the broader unified config contract that governs how every per-adapter capability (env vars, dependencies, internet access) is advertised, stored, rendered, and enforced. Each sandbox config carries a list of env var entries — either literal name/value pairs or secret references that point to an encrypted key in the Secret table — plus internet_access and dependencies fields. SANDBOX_ADAPTER_METADATA advertises which capabilities each adapter supports so the frontend renders only the editors that apply and the runtime fails closed on anything else.

At resolution time literals pass through directly, secret refs are decrypted from the Secret table (with a clear error when a referenced key is missing), and the merged env dict is forwarded to the underlying provider SDK for all seven adapters (E2B, Deno, Daytona, Vercel Python/TypeScript, Modal, WASM). The settings dialog gets a new env-vars editor with a secret picker populated from the existing secrets query, so admins can bind a sandbox env var to an existing secret without re-typing the plaintext.

PR #12655 originally introduced the shape of a unified config (three structured sections: env_vars, dependencies, internet_access, plus three capability flags on AdapterMetadata) but never locked it as a contract. This change writes the contract down and reconciles every layer so the rules are enforced rather than emergent:

  • AdapterMetadata is the canonical source of capability advertisement. Class-level docstring + per-field comments specify, for each capability, what the values mean, what build_backend must do, and how the UI should render. SandboxAdapter.build_backend and validate_config cross-reference it so future adapter authors find the rules from source.
  • Fail-closed runtime, no silent drops. Every adapter × capability cell is now either flag-true + wired end-to-end or flag-false + build_backend raises UnsupportedOperation on non-empty input. MODAL supports_env_vars flipped to True (the runtime always worked); ModalSandboxBackend.execute() now raises on per-call env overrides instead of silently warning.
  • Preserve-on-save UI. formValuesToConfigPatch (extracted to utils.tsx) deep-merges form state into a clone of the raw stored config, preserving capability-gated keys when the flag is false or activeBackend is undefined. The Advanced Config JSON editor cannot create or mutate env_vars / internet_access / dependencies — only the structured editors author them. Hidden capability sections render a single muted "Not supported by the selected backend." placeholder so users see the gap is intentional. InternetAccessMode.ALLOWLIST is reserved-for-future and not user-selectable.
  • Parametrized conformance tests. New test_unified_config_contract.py iterates over SANDBOX_ADAPTER_METADATA.keys() and asserts flag/config-model/runtime agreement for all three capabilities; test_capability_advertisement.py and test_queries.py were refactored to parametrize from the same registry source. Adding a new adapter automatically participates — no hand-maintained per-adapter list can drift.

Env vars land on the sandbox config and resolve into the dict that feeds build_backend(), rather than as a per-call argument to execute(). This was driven by Modal's lifecycle: Sandbox.create() accepts env_dict but sandbox.exec() does not, so env must land at sandbox-creation time. Routing env through the config dict gives every adapter a single injection point that maps cleanly to whatever its SDK supports — and makes env vars uniform with the other two capabilities the contract now covers.

                  ┌──────────────────────────────────────────────┐
                  │   AdapterMetadata  (canonical contract)      │
                  │   - supports_env_vars: bool                  │
                  │   - internet_access: Literal[none|bool|...]  │
                  │   - dependencies_language: Optional[Literal] │
                  │   [docstring = unified config contract]      │
                  └──────────────────────┬───────────────────────┘
                                         │ SANDBOX_ADAPTER_METADATA
                                         │ (single registry)
         ┌───────────────────────────────┼───────────────────────────────┐
         │ renders from                  │ enforced by                   │ parametrized by
         ▼                               ▼                               ▼
 ┌────────────────────────┐   ┌────────────────────────┐   ┌──────────────────────────┐
 │  SandboxConfigDialog   │   │  SandboxBackendInfo    │   │  test_unified_config_    │
 │  env vars editor +     │   │  (GQL)                 │   │  contract                │
 │  secret picker         │   │  - supportsEnvVars     │   │  test_capability_        │
 │  flags → render only   │   │  - internetAccess      │   │  advertisement           │
 │  preserve-on-save      │   │  - dependenciesLang    │   │  test_queries            │
 │  "Not supported by..." │   └────────────────────────┘   │  iterate METADATA.keys() │
 │  utils.tsx (testable)  │                                │  flag ↔ model ↔ runtime  │
 └────────────┬───────────┘                                └──────────────────────────┘
              │ writes structured sections only
              ▼
 ┌──────────────────────────┐        ┌──────────────────────────┐
 │     sandbox_configs      │        │       Secret table       │
 │  env_vars · dependencies │        │    (Fernet-encrypted)    │
 │     · internet_access    │        └────────────┬─────────────┘
 │  (literal | secret_ref)  │                     │ decrypt
 └────────────┬─────────────┘                     │ secret_refs
              │                                   │
              ▼                                   ▼
 ┌───────────────────────────────────────────────────────────────┐
 │                     _resolve_user_env                         │
 │   literals pass through · secret_refs decrypted from table    │
 │          MissingSecretError if referenced key absent          │
 └────────────────────────────┬──────────────────────────────────┘
                              │ merged env dict
                              ▼
 ┌───────────────────────────────────────────────────────────────┐
 │       get_or_create_backend → build_backend(config)           │
 │                                                               │
 │   flag=True   → wire end-to-end (SDK forward)                 │
 │   flag=False  → raise UnsupportedOperation                    │
 │                 (fail-closed; no silent drops)                │
 └────────────────────────────┬──────────────────────────────────┘
                              │
            ┌─────────────────┴────────────────┐
            ▼                                  ▼
 ┌──────────────────────┐          ┌──────────────────────┐
 │ Per-exec injection   │          │ Create-time injection│
 │ E2B · Deno · Daytona │          │    Vercel · Modal    │
 │         WASM         │          │  (Modal per-call env │
 │                      │          │   now raises Unsupp.)│
 └──────────────────────┘          └──────────────────────┘

Test plan

  • Unit tests for env var resolution: literal pass-through, secret-ref decryption, missing-secret error path
  • Unit tests for each backend forwarding the merged env dict to its SDK call
  • Unit tests asserting capability advertisement per adapter (supports_env_vars, internet_access, dependencies_language)
  • End-to-end resolution test covering the full config → resolver → backend chain
  • Parametrized conformance test (test_unified_config_contract.py) iterating SANDBOX_ADAPTER_METADATA.keys() — flag/config-model/runtime agreement for all three capabilities
  • test_capability_advertisement.py and test_queries.py refactored to parametrize from the same registry source (no duplicate hard-coded matrix)
  • test_user_env_forwarding.py covers MODAL's new execute-time policy (per-call execute(env=...) raises UnsupportedOperation; session-level user_env still forwarded)
  • Frontend unit tests (formValuesToConfigPatch.test.ts) covering the three preserve-on-save data-loss scenarios: flag flip False→True→False, activeBackend undefined, non-UI-path stored data
  • Playwright coverage for the settings-page env var editor and secret picker
  • Manual verification: create a sandbox config with a literal env var and a secret ref, run a code evaluator, confirm both values land in the sandboxed process
  • Manual verification: delete a referenced secret and confirm the resulting execution error surfaces the missing key name
  • Manual verification: configure a sandbox with internet_access set to a non-"none" mode via the API and confirm build_backend raises UnsupportedOperation (fail-closed)
  • Manual verification: round-trip a sandbox config in the UI with an unrelated capability flag flipped — confirm capability-gated keys persist (preserve-on-save)

…box execution

Extends per-adapter config schemas with env_vars (literal values or references to
the encrypted Secret table), internet_access, and dependencies fields. Wires
resolution through get_or_create_backend and forwards merged env to all seven
adapters. Adds SandboxConfigDialog UI for managing entries with a secret picker,
capability advertisement per adapter, and end-to-end tests.
@anticorrelator anticorrelator requested review from a team as code owners April 14, 2026 06:41
@github-project-automation github-project-automation bot moved this to 📘 Todo in phoenix Apr 14, 2026
@dosubot dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Apr 14, 2026
Comment thread src/phoenix/server/sandbox/daytona_backend.py Outdated
Add user_env parameter to test-only SandboxAdapter subclasses to match
the updated base signature, and apply ruff formatting to env_var_end_to_end.
Generated subprocess.run args were embedding self._packages as a nested
list literal, which raises TypeError at runtime. Use * unpacking so each
package becomes its own argv element.

Reported by Claude Code Review on PR #12655.
…back with GQL mutations

Adds 2-tier credential resolution (DB Secret → env var) for sandbox
provider authentication, centralized in `get_or_create_backend()` so
adapters stay credential-agnostic. Introduces `EnvVarSpec` as the
declarative contract between the factory and each adapter, and exposes
`setSandboxCredential` / `deleteSandboxCredential` GraphQL mutations
(with `env_var_specs` validation and `_BACKEND_CACHE` invalidation) so
admins can rotate sandbox credentials via the UI without a server
restart.
Comment thread src/phoenix/server/api/mutations/sandbox_config_mutations.py
Comment thread src/phoenix/server/sandbox/vercel_backend.py
Comment thread src/phoenix/server/sandbox/__init__.py
…ve-on-save UI

Lock the unified env_vars/dependencies/internet_access contract across all
sandbox adapters. AdapterMetadata is the canonical source of capability
advertisement; build_backend MUST raise UnsupportedOperation for any
capability-gated input the adapter cannot fulfill (no silent drops).

Backend reconciliation:
- MODAL: flip supports_env_vars=True (runtime already worked); make
  ModalSandboxBackend.execute() raise UnsupportedOperation for per-call
  env overrides (was a silent warning); session-level user_env preserved
- E2B/Vercel-Py/Vercel-TS/Modal: reject non-empty `dependencies` in
  build_backend
- E2B/Daytona/Modal/Vercel-Py/Vercel-TS/Deno: reject non-"none"
  `internet_access` in build_backend
- VercelPython/VercelTypescript/Deno configs: add Optional
  internet_access field for shape uniformity

Frontend (SandboxConfigDialog):
- formValuesToConfigPatch (extracted to utils.tsx for testability):
  deep-merge form state into a clone of the raw stored config; preserves
  capability-gated keys when flag is false or activeBackend is undefined,
  closing the False→True→False data-loss path
- Advanced Config JSON editor explicitly strips capability-gated keys on
  submit; structured editors are the only authors
- Hidden capability sections render a single muted "Not supported by the
  selected backend." placeholder instead of being omitted
- ALLOWLIST internet_access mode marked reserved-for-future

Tests:
- New parametrized test_unified_config_contract.py iterates over
  SANDBOX_ADAPTER_METADATA.keys() and enforces flag/config-model/runtime
  agreement for all three capabilities; new adapters must opt-in to pass
- test_capability_advertisement.py and test_queries.py refactored to
  parametrize from the same registry source — eliminates duplicate
  hard-coded matrices
- test_user_env_forwarding.py covers MODAL's new execute-time policy
- New formValuesToConfigPatch.test.ts covers the three preserve-on-save
  data-loss scenarios

Documents the contract on AdapterMetadata with cross-references from
SandboxAdapter.build_backend and validate_config so future adapters
inherit the rules from source.
Comment thread src/phoenix/server/sandbox/vercel_backend.py
Comment thread src/phoenix/server/api/mutations/sandbox_config_mutations.py
Comment thread src/phoenix/server/sandbox/__init__.py
…d WASM

The unified config contract tests expect adapters with dependencies_language=None
or internet_access="none" in SANDBOX_ADAPTER_METADATA to raise UnsupportedOperation
when build_backend is called with a conflicting config. DENO was missing the
packages guard; WASM was missing both packages and internet_access guards.
Mirrors the existing pattern in e2b/modal/vercel backends.
@claude
Copy link
Copy Markdown
Contributor

claude bot commented Apr 15, 2026

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

Status: 📘 Todo

Development

Successfully merging this pull request may close these issues.

1 participant