feat: add langchain deepagents backend integration by johannhartmann · Pull Request #1 · mayflower/agent-sandbox

johannhartmann · 2026-01-25T17:06:23Z

What: Adds langchain-agent-sandbox integration package for LangChain DeepAgents, including adapter implementation, tests, and docs.
- Why: Keeps LangChain/DeepAgents dependencies optional while enabling DeepAgents tooling for agent-sandbox users.
- How:
  - Implements SandboxBackendProtocol with protocol-compliant error mapping.
  - Adds unit tests + gated e2e test (via LANGCHAIN_* env vars).
  - Adds agentic_sandbox[langchain] extra and README usage snippet.
Testing
- ruff check …
- mypy (root mypy.ini)
- bandit -r …
- make test-langchain (optional, not run on CI unless configured)
- Optional e2e: python -m pytest test/e2e/clients/python/test_e2e_langchain_backend.py with LANGCHAIN_* vars

…uter (kubernetes-sigs#371) * fix: preserve query string when proxying requests in sandbox router * test: add unit test * chore * trigger ci

…kubernetes-sigs#511) * Updates deploy-to-kube script to give the options of deploying the controller with extensions installed * Updates the Makefile to use the new flag * Uses metadata name to identify controller instead of file name * Adds controller as a variable to deploy-kind * Enables extensions flag in the CI test suite

…ateStatu…" (kubernetes-sigs#526) This reverts commit 953032b.

…kubernetes-sigs#531) * feat(python): include spec.lifecycle in SandboxClaim at creation time Add shutdown_after_seconds parameter to create_sandbox() so claims are expire-safe from birth. Previously, setting a TTL required a separate PATCH after creation, leaving a vulnerability window where a client crash could orphan claims with no expiration. The new keyword-only parameter computes a UTC shutdown time and includes spec.lifecycle (shutdownTime + shutdownPolicy: Delete) in the initial manifest. Validation rejects non-int, non-positive, and overflow values. Shared build_lifecycle() utility in lifecycle.py avoids drift between sync and async clients. No controller or CRD changes needed — the lifecycle field already exists and is read on every reconcile. Made-with: Cursor * test: add integration test for lifecycle-at-creation code path Exercises the full path from create_sandbox(shutdown_after_seconds=N) through build_lifecycle(), _create_claim(), and K8sHelper down to the manifest body passed to the K8s API — only the API transport is mocked. Validates: - lifecycle dict appears in spec when shutdown_after_seconds is set - shutdownTime falls in the expected UTC window - shutdownPolicy is "Delete" - no lifecycle when shutdown_after_seconds is omitted - validation rejects invalid input before any K8s API call Made-with: Cursor * fix: address PR review feedback - Rename build_lifecycle -> construct_sandbox_claim_lifecycle_spec, move from lifecycle.py to utils.py - Add docstrings for shutdown_after_seconds on both sync and async create_sandbox() - Add OTel span attributes for lifecycle shutdown_time and shutdown_policy in both sync and async _create_claim() - Strengthen return type hint to dict[str, str] - Simplify type check to `type(x) is not int` - Simplify lifecycle extraction in test (use keyword arg directly) - Remove unused mock_datetime.side_effect in test - Move timedelta import to top of integration test file Made-with: Cursor

…tes-sigs#347) * Enable sandboxwarmpool on template updates * Fetch template once * Fix lint * Added additional test case checks * Add UpdateStrategy * fix:lint * Address comments * Update isSandboxStale * fix: ut * Add additional tc * Address comments * fix ut * Remove Semantic Equality check * Revert "Remove Semantic Equality check" This reverts commit 33d32d2. * Add semantic check * Add check in sandboxclaim * Check staleness for orphaned sandboxes * Remove check in sandboxclaim controller * nit

Adds clients/python/langchain-agent-sandbox, a Python SandboxBackendProtocol implementation from deepagents (>=0.5.0) that wraps a kubernetes-sigs/agent-sandbox Sandbox handle. An agent running through this backend executes shell commands and file operations inside a managed sandbox pod rather than on the host, while presenting the same contract as any other deepagents backend. ## Package contents clients/python/langchain-agent-sandbox/: - langchain_agent_sandbox/backend.py AgentSandboxBackend class implementing every SandboxBackendProtocol method (execute, ls, read, write, edit, grep, glob, upload_files, download_files, plus all async variants). Also exports: - from_template() factory for lifecycle-managed sandboxes via direct / gateway / tunnel connection modes - SandboxPolicyWrapper (deny_prefixes, deny_commands, audit_log) for policy enforcement - WarmPoolBackend for warmpool-adopted sandboxes - create_sandbox_backend_factory() helper for `create_deep_agent(backend=...)` - langchain_agent_sandbox/__init__.py public exports - tests/test_backend.py 88 unit tests using a StubSandbox (SimpleNamespace + Mock), covering the protocol surface, path virtualization, policy wrapper, warm pool, and every fix below - pyproject.toml deepagents>=0.5.0 + k8s-agent-sandbox - README.md, uv.lock examples/langchain-deepagents/: - main.py minimal end-to-end example that runs a deepagents agent against a provisioned sandbox - sandbox-template.yaml a SandboxTemplate the example claims from - README.md + run-test-kind.sh kind workflow walkthrough - .deepagents/skills/* example skill files test/e2e/clients/python/test_e2e_langchain_backend.py: env-gated kind integration test that exercises execute, write, read, edit, grep, glob, upload_files, and download_files against a real sandbox pod. Skips silently when LANGCHAIN_SANDBOX_TEMPLATE is unset. ## Repo integration - Makefile: `test-langchain` target runs the unit suite (`uv run pytest clients/python/langchain-agent-sandbox/tests/ -v --junitxml=bin/langchain-backend-junit.xml`). - dev/tools/test-e2e: `setup_python_sdk` pip-installs `langchain-agent-sandbox[test]` if the directory is present, and `run_python_e2e_tests` discovers the e2e test through the standard pytest invocation on `test/e2e/`. - test/e2e/README.md: documents the `LANGCHAIN_SANDBOX_TEMPLATE`, `LANGCHAIN_NAMESPACE`, `LANGCHAIN_GATEWAY_NAME`, `LANGCHAIN_API_URL`, `LANGCHAIN_USE_TUNNEL`, `LANGCHAIN_SERVER_PORT`, and `LANGCHAIN_ROOT_DIR` env vars. - examples/README.md: links to the new example. ## deepagents 0.5.x protocol compliance deepagents 0.5.0 renamed the backend protocol method set and replaced plain returns with typed result dataclasses. This backend targets the new API from the start: - ls_info -> ls returning LsResult - grep_raw -> grep returning GrepResult - glob_info -> glob returning GlobResult - read returning ReadResult(file_data=FileData(content=..., encoding="utf-8")) with raw content (the middleware handles line numbering via format_content_with_line_numbers, so the backend returns unformatted output) - WriteResult / EditResult constructed without the deprecated `files_update` kwarg (explicit None emits a DeprecationWarning in 0.5.x) - execute / aexecute accept a keyword-only `timeout: Optional[int] = None` matching the new SandboxBackendProtocol signature ## Error-handling hardening All error paths are surfaced through the typed result fields so the deepagents middleware can react without losing context: - ls / grep / glob: sandbox-side command invocation is wrapped in try/except and exceptions surface via Result(error="..."). On `exit_code != 0` the stderr is propagated into the error field alongside an empty entries/ matches list rather than a stale stdout. - read / edit: strict utf-8 decode (no `errors="replace"`) so non-UTF-8 files report a typed error instead of silently producing lossy content labelled as utf-8. - read: empty files return empty content regardless of offset; offset >= len(lines) on a non-empty file returns ReadResult(error="Line offset N exceeds file length..."). - execute: distinguishes TimeoutError (exit_code=-2, output prefixed with "Timed out") from other failures (exit_code=-1, "Error:" prefix). ## Policy wrapper SandboxPolicyWrapper wraps any AgentSandboxBackend and enforces three rules at call time: - deny_prefixes (writes / edits / uploads): path-prefix deny list, canonicalized so traversal-style bypasses like `/app/../etc` are caught - deny_commands (execute): substring match against a deny list; returns ExecuteResponse with exit_code=1 and a "Policy denied" prefix - audit_log: optional callback invoked with (operation, target, metadata) on every write / edit / execute / upload Read operations pass through without checks. ## kind e2e Running the Python e2e against a real kind cluster (`LANGCHAIN_SANDBOX_TEMPLATE=df-standard LANGCHAIN_NAMESPACE=darkfactory LANGCHAIN_USE_TUNNEL=1 KUBECONFIG=bin/KUBECONFIG`) exercises the full backend surface against a live sandbox pod: - execute -> shell command round-trip - write -> /langchain_e2e.txt created with 3 lines - read -> content reflects the write - edit(replace_all=False) -> single-occurrence replacement - grep -> finds matches by literal pattern - glob("**/langchain_e2e.txt") -> matches the file at the root of the search path - upload_files([("/nested/dir/extra.txt", ...)]) -> creates the nested directory chain on demand and uploads the payload - download_files -> round-trips the bytes All paths green. Four pre-existing backend bugs surfaced during this run and are fixed in this commit: 1. grep command appended `2>&1` as a shell redirect, but the sandbox runtime runs commands via subprocess.run + shlex.split (no shell). `2>&1` became a literal grep argument, grep tried to open a file named `2>&1`, failed with exit 2, and the exit-code-based error detection flagged real matches as errors. Dropped the suffix; grep's stderr goes to the runtime's stderr channel. 2. glob's `**` support was broken. pathlib.PurePosixPath.match in Python 3.11 treats `**` as two consecutive `*` wildcards, NOT as recursive globstar, so `**/target.txt` failed to match `target.txt` at the root. Replaced the PurePath.match call with a dedicated `_compile_glob` helper that translates the pattern to a regex with proper `**` handling (zero-or-more path components). Patterns without any `/` fall back to basename-only matching so `glob("*.py")` still means "any .py in the tree". 3. upload_files refused paths with missing parent directories, returning `error="invalid_path"` instead of creating the parent chain on demand. write() already calls `_ensure_parent_dir` (mkdir -p) before uploading, so the two write APIs were inconsistent. upload_files now calls `_ensure_parent_dir` when parent_state is "missing". 4. test_e2e_langchain_backend.py used a stale `SandboxClient(template_name=..., namespace=..., gateway_name= ..., ...)` constructor signature that the upstream k8s_agent_sandbox.SandboxClient no longer accepts. Switched to `AgentSandboxBackend.from_template()` which wires the current SandboxClient API internally and presents the same option set. ## Test results - 88 unit tests pass under `-W error::DeprecationWarning` - Python e2e passes end-to-end against the real kind cluster (test_langchain_backend_basic) - No `any` / untyped leak points, type annotations throughout - Apache-2.0 headers on every new file

Eleven fixes layered on the existing PR. 123 unit tests pass; kind e2e against a real cluster still passes. No public-API breakage. Critical: - execute() timeout detection was dead code: the SDK wraps requests.exceptions.Timeout into SandboxRequestError via `raise ... from e`, never matching `except TimeoutError`. New _is_timeout_exception() walks __cause__/__context__ and detects builtin TimeoutError, requests/httpx Timeout, plus a duck-typed name fallback for future SDK exceptions that don't chain. - _compile_glob middle-`**` matched bare `ab` for `a/**/b`. Rewrote the translator to handle leading/middle/trailing `**` distinctly; trailing follows the gitignore semantic (`a/**` rejects bare `a`). - create_sandbox_backend_factory returned an un-entered backend, so the first call AttributeError'd. Now eagerly enters and registers a weakref.finalize for GC/shutdown teardown. The handle is exposed as backend._finalizer for deterministic test invocation. Important: - __exit__ silently swallowed delete failures. Now re-raises on the happy path; on user-exception unwind raises BaseExceptionGroup so neither the user error nor the leak signal is lost. - _factory_atexit_cleanup blanket-swallowed every error. Now filters only HTTP 404 (the redundant-cleanup case) and logs everything else at ERROR with a shutdown-safe inner guard. - SandboxPolicyWrapper gained `strict_audit: bool = False` (kw-only). When True, audit-callback failures refuse the operation; deny detail propagates through write/edit/upload result fields instead of flattening to "policy_denied". Refactored four inline audit blocks into a single _emit_audit() helper. Default unchanged. - _emit_audit log lines now include operation, target, metadata, and exc_info=True for SRE diagnosability. - glob() catches non-re.error compilation failures (IndexError, TypeError, RecursionError) and returns a typed GlobResult. - grep() error detail reads stderr first (with stdout/exit-code fallbacks) instead of always-empty stdout. - SandboxPolicyWrapper docstring rewritten to drop "enterprise-grade restrictions" — it's an application-layer guardrail, not a security boundary. The runtime (gVisor/Kata) is the boundary. Tests: 88 -> 123. New coverage for _compile_glob branches (10 tests), wrapped-timeout detection (real ReadTimeout, not a mock), factory eager-enter and finalizer (404 vs non-404), __exit__ ExceptionGroup contract, strict_audit on execute/write/edit/upload, parametrized fail-open coverage for write/edit/upload, and grep stderr error detail.

johannhartmann force-pushed the feat-langchain-deepagents-backend branch 2 times, most recently from a943354 to d750e34 Compare January 25, 2026 18:41

johannhartmann force-pushed the feat-langchain-deepagents-backend branch 3 times, most recently from a52322f to d3df291 Compare February 7, 2026 13:47

johannhartmann force-pushed the feat-langchain-deepagents-backend branch 2 times, most recently from 26dd56c to 015ad58 Compare February 11, 2026 09:22

johannhartmann force-pushed the feat-langchain-deepagents-backend branch from f202be6 to f7f6921 Compare February 19, 2026 18:10

johannhartmann force-pushed the feat-langchain-deepagents-backend branch 3 times, most recently from daeb330 to 4bdb0c4 Compare March 13, 2026 07:37

johannhartmann force-pushed the feat-langchain-deepagents-backend branch from 4bdb0c4 to 09e0021 Compare March 24, 2026 12:52

johannhartmann force-pushed the feat-langchain-deepagents-backend branch 2 times, most recently from 7b91eeb to 6561716 Compare April 7, 2026 16:50

xiaoj655 and others added 7 commits April 7, 2026 23:17

fix: query parameters are lost when proxying requests in a sandbox ro…

4b57438

…uter (kubernetes-sigs#371) * fix: preserve query string when proxying requests in sandbox router * test: add unit test * chore * trigger ci

Revert "optimization: replace .Update() with .Patch() for sandbox upd…

2969423

…ateStatu…" (kubernetes-sigs#526) This reverts commit 953032b.

Implement suspend and resume with snapshots. (kubernetes-sigs#541)

059575c

johannhartmann force-pushed the feat-langchain-deepagents-backend branch 2 times, most recently from 2012af5 to 40e1009 Compare April 8, 2026 13:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add langchain deepagents backend integration#1

feat: add langchain deepagents backend integration#1
johannhartmann wants to merge 8 commits intomainfrom
feat-langchain-deepagents-backend

johannhartmann commented Jan 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

johannhartmann commented Jan 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants