feat: add langchain deepagents backend integration#1
Open
johannhartmann wants to merge 8 commits intomainfrom
Open
feat: add langchain deepagents backend integration#1johannhartmann wants to merge 8 commits intomainfrom
johannhartmann wants to merge 8 commits intomainfrom
Conversation
a943354 to
d750e34
Compare
a52322f to
d3df291
Compare
26dd56c to
015ad58
Compare
f202be6 to
f7f6921
Compare
daeb330 to
4bdb0c4
Compare
4bdb0c4 to
09e0021
Compare
7b91eeb to
6561716
Compare
…uter (kubernetes-sigs#371) * fix: preserve query string when proxying requests in sandbox router * test: add unit test * chore * trigger ci
…kubernetes-sigs#511) * Updates deploy-to-kube script to give the options of deploying the controller with extensions installed * Updates the Makefile to use the new flag * Uses metadata name to identify controller instead of file name * Adds controller as a variable to deploy-kind * Enables extensions flag in the CI test suite
…ateStatu…" (kubernetes-sigs#526) This reverts commit 953032b.
…kubernetes-sigs#531) * feat(python): include spec.lifecycle in SandboxClaim at creation time Add shutdown_after_seconds parameter to create_sandbox() so claims are expire-safe from birth. Previously, setting a TTL required a separate PATCH after creation, leaving a vulnerability window where a client crash could orphan claims with no expiration. The new keyword-only parameter computes a UTC shutdown time and includes spec.lifecycle (shutdownTime + shutdownPolicy: Delete) in the initial manifest. Validation rejects non-int, non-positive, and overflow values. Shared build_lifecycle() utility in lifecycle.py avoids drift between sync and async clients. No controller or CRD changes needed — the lifecycle field already exists and is read on every reconcile. Made-with: Cursor * test: add integration test for lifecycle-at-creation code path Exercises the full path from create_sandbox(shutdown_after_seconds=N) through build_lifecycle(), _create_claim(), and K8sHelper down to the manifest body passed to the K8s API — only the API transport is mocked. Validates: - lifecycle dict appears in spec when shutdown_after_seconds is set - shutdownTime falls in the expected UTC window - shutdownPolicy is "Delete" - no lifecycle when shutdown_after_seconds is omitted - validation rejects invalid input before any K8s API call Made-with: Cursor * fix: address PR review feedback - Rename build_lifecycle -> construct_sandbox_claim_lifecycle_spec, move from lifecycle.py to utils.py - Add docstrings for shutdown_after_seconds on both sync and async create_sandbox() - Add OTel span attributes for lifecycle shutdown_time and shutdown_policy in both sync and async _create_claim() - Strengthen return type hint to dict[str, str] - Simplify type check to `type(x) is not int` - Simplify lifecycle extraction in test (use keyword arg directly) - Remove unused mock_datetime.side_effect in test - Move timedelta import to top of integration test file Made-with: Cursor
…tes-sigs#347) * Enable sandboxwarmpool on template updates * Fetch template once * Fix lint * Added additional test case checks * Add UpdateStrategy * fix:lint * Address comments * Update isSandboxStale * fix: ut * Add additional tc * Address comments * fix ut * Remove Semantic Equality check * Revert "Remove Semantic Equality check" This reverts commit 33d32d2. * Add semantic check * Add check in sandboxclaim * Check staleness for orphaned sandboxes * Remove check in sandboxclaim controller * nit
Adds clients/python/langchain-agent-sandbox, a Python
SandboxBackendProtocol implementation from deepagents (>=0.5.0)
that wraps a kubernetes-sigs/agent-sandbox Sandbox handle. An
agent running through this backend executes shell commands and
file operations inside a managed sandbox pod rather than on the
host, while presenting the same contract as any other deepagents
backend.
## Package contents
clients/python/langchain-agent-sandbox/:
- langchain_agent_sandbox/backend.py AgentSandboxBackend class
implementing every SandboxBackendProtocol method (execute, ls,
read, write, edit, grep, glob, upload_files, download_files,
plus all async variants). Also exports:
- from_template() factory for lifecycle-managed sandboxes via
direct / gateway / tunnel connection modes
- SandboxPolicyWrapper (deny_prefixes, deny_commands,
audit_log) for policy enforcement
- WarmPoolBackend for warmpool-adopted sandboxes
- create_sandbox_backend_factory() helper for
`create_deep_agent(backend=...)`
- langchain_agent_sandbox/__init__.py public exports
- tests/test_backend.py 88 unit tests using a StubSandbox
(SimpleNamespace + Mock), covering the protocol surface, path
virtualization, policy wrapper, warm pool, and every fix below
- pyproject.toml deepagents>=0.5.0 + k8s-agent-sandbox
- README.md, uv.lock
examples/langchain-deepagents/:
- main.py minimal end-to-end example that runs a deepagents
agent against a provisioned sandbox
- sandbox-template.yaml a SandboxTemplate the example claims
from
- README.md + run-test-kind.sh kind workflow walkthrough
- .deepagents/skills/* example skill files
test/e2e/clients/python/test_e2e_langchain_backend.py: env-gated
kind integration test that exercises execute, write, read, edit,
grep, glob, upload_files, and download_files against a real
sandbox pod. Skips silently when LANGCHAIN_SANDBOX_TEMPLATE is
unset.
## Repo integration
- Makefile: `test-langchain` target runs the unit suite
(`uv run pytest clients/python/langchain-agent-sandbox/tests/
-v --junitxml=bin/langchain-backend-junit.xml`).
- dev/tools/test-e2e: `setup_python_sdk` pip-installs
`langchain-agent-sandbox[test]` if the directory is present,
and `run_python_e2e_tests` discovers the e2e test through the
standard pytest invocation on `test/e2e/`.
- test/e2e/README.md: documents the `LANGCHAIN_SANDBOX_TEMPLATE`,
`LANGCHAIN_NAMESPACE`, `LANGCHAIN_GATEWAY_NAME`,
`LANGCHAIN_API_URL`, `LANGCHAIN_USE_TUNNEL`,
`LANGCHAIN_SERVER_PORT`, and `LANGCHAIN_ROOT_DIR` env vars.
- examples/README.md: links to the new example.
## deepagents 0.5.x protocol compliance
deepagents 0.5.0 renamed the backend protocol method set and
replaced plain returns with typed result dataclasses. This
backend targets the new API from the start:
- ls_info -> ls returning LsResult
- grep_raw -> grep returning GrepResult
- glob_info -> glob returning GlobResult
- read returning ReadResult(file_data=FileData(content=...,
encoding="utf-8")) with raw content (the middleware handles
line numbering via format_content_with_line_numbers, so the
backend returns unformatted output)
- WriteResult / EditResult constructed without the deprecated
`files_update` kwarg (explicit None emits a DeprecationWarning
in 0.5.x)
- execute / aexecute accept a keyword-only `timeout: Optional[int]
= None` matching the new SandboxBackendProtocol signature
## Error-handling hardening
All error paths are surfaced through the typed result fields so
the deepagents middleware can react without losing context:
- ls / grep / glob: sandbox-side command invocation is wrapped in
try/except and exceptions surface via
Result(error="..."). On `exit_code != 0` the stderr is
propagated into the error field alongside an empty entries/
matches list rather than a stale stdout.
- read / edit: strict utf-8 decode (no `errors="replace"`) so
non-UTF-8 files report a typed error instead of silently
producing lossy content labelled as utf-8.
- read: empty files return empty content regardless of offset;
offset >= len(lines) on a non-empty file returns
ReadResult(error="Line offset N exceeds file length...").
- execute: distinguishes TimeoutError (exit_code=-2, output
prefixed with "Timed out") from other failures (exit_code=-1,
"Error:" prefix).
## Policy wrapper
SandboxPolicyWrapper wraps any AgentSandboxBackend and enforces
three rules at call time:
- deny_prefixes (writes / edits / uploads): path-prefix deny
list, canonicalized so traversal-style bypasses like
`/app/../etc` are caught
- deny_commands (execute): substring match against a deny list;
returns ExecuteResponse with exit_code=1 and a
"Policy denied" prefix
- audit_log: optional callback invoked with
(operation, target, metadata) on every write / edit / execute /
upload
Read operations pass through without checks.
## kind e2e
Running the Python e2e against a real kind cluster
(`LANGCHAIN_SANDBOX_TEMPLATE=df-standard
LANGCHAIN_NAMESPACE=darkfactory LANGCHAIN_USE_TUNNEL=1
KUBECONFIG=bin/KUBECONFIG`) exercises the full backend surface
against a live sandbox pod:
- execute -> shell command round-trip
- write -> /langchain_e2e.txt created with 3 lines
- read -> content reflects the write
- edit(replace_all=False) -> single-occurrence replacement
- grep -> finds matches by literal pattern
- glob("**/langchain_e2e.txt") -> matches the file at the root
of the search path
- upload_files([("/nested/dir/extra.txt", ...)]) -> creates the
nested directory chain on demand and uploads the payload
- download_files -> round-trips the bytes
All paths green. Four pre-existing backend bugs surfaced during
this run and are fixed in this commit:
1. grep command appended `2>&1` as a shell redirect, but the
sandbox runtime runs commands via subprocess.run + shlex.split
(no shell). `2>&1` became a literal grep argument, grep tried
to open a file named `2>&1`, failed with exit 2, and the
exit-code-based error detection flagged real matches as
errors. Dropped the suffix; grep's stderr goes to the runtime's
stderr channel.
2. glob's `**` support was broken.
pathlib.PurePosixPath.match in Python 3.11 treats `**` as two
consecutive `*` wildcards, NOT as recursive globstar, so
`**/target.txt` failed to match `target.txt` at the root.
Replaced the PurePath.match call with a dedicated
`_compile_glob` helper that translates the pattern to a regex
with proper `**` handling (zero-or-more path components).
Patterns without any `/` fall back to basename-only matching
so `glob("*.py")` still means "any .py in the tree".
3. upload_files refused paths with missing parent directories,
returning `error="invalid_path"` instead of creating the
parent chain on demand. write() already calls
`_ensure_parent_dir` (mkdir -p) before uploading, so the two
write APIs were inconsistent. upload_files now calls
`_ensure_parent_dir` when parent_state is "missing".
4. test_e2e_langchain_backend.py used a stale
`SandboxClient(template_name=..., namespace=..., gateway_name=
..., ...)` constructor signature that the upstream
k8s_agent_sandbox.SandboxClient no longer accepts. Switched to
`AgentSandboxBackend.from_template()` which wires the current
SandboxClient API internally and presents the same option
set.
## Test results
- 88 unit tests pass under `-W error::DeprecationWarning`
- Python e2e passes end-to-end against the real kind cluster
(test_langchain_backend_basic)
- No `any` / untyped leak points, type annotations throughout
- Apache-2.0 headers on every new file
2012af5 to
40e1009
Compare
Eleven fixes layered on the existing PR. 123 unit tests pass; kind e2e against a real cluster still passes. No public-API breakage. Critical: - execute() timeout detection was dead code: the SDK wraps requests.exceptions.Timeout into SandboxRequestError via `raise ... from e`, never matching `except TimeoutError`. New _is_timeout_exception() walks __cause__/__context__ and detects builtin TimeoutError, requests/httpx Timeout, plus a duck-typed name fallback for future SDK exceptions that don't chain. - _compile_glob middle-`**` matched bare `ab` for `a/**/b`. Rewrote the translator to handle leading/middle/trailing `**` distinctly; trailing follows the gitignore semantic (`a/**` rejects bare `a`). - create_sandbox_backend_factory returned an un-entered backend, so the first call AttributeError'd. Now eagerly enters and registers a weakref.finalize for GC/shutdown teardown. The handle is exposed as backend._finalizer for deterministic test invocation. Important: - __exit__ silently swallowed delete failures. Now re-raises on the happy path; on user-exception unwind raises BaseExceptionGroup so neither the user error nor the leak signal is lost. - _factory_atexit_cleanup blanket-swallowed every error. Now filters only HTTP 404 (the redundant-cleanup case) and logs everything else at ERROR with a shutdown-safe inner guard. - SandboxPolicyWrapper gained `strict_audit: bool = False` (kw-only). When True, audit-callback failures refuse the operation; deny detail propagates through write/edit/upload result fields instead of flattening to "policy_denied". Refactored four inline audit blocks into a single _emit_audit() helper. Default unchanged. - _emit_audit log lines now include operation, target, metadata, and exc_info=True for SRE diagnosability. - glob() catches non-re.error compilation failures (IndexError, TypeError, RecursionError) and returns a typed GlobResult. - grep() error detail reads stderr first (with stdout/exit-code fallbacks) instead of always-empty stdout. - SandboxPolicyWrapper docstring rewritten to drop "enterprise-grade restrictions" — it's an application-layer guardrail, not a security boundary. The runtime (gVisor/Kata) is the boundary. Tests: 88 -> 123. New coverage for _compile_glob branches (10 tests), wrapped-timeout detection (real ReadTimeout, not a mock), factory eager-enter and finalizer (404 vs non-404), __exit__ ExceptionGroup contract, strict_audit on execute/write/edit/upload, parametrized fail-open coverage for write/edit/upload, and grep stderr error detail.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What: Adds langchain-agent-sandbox integration package for LangChain DeepAgents, including adapter implementation, tests, and docs.
Testing