Skip to content

Add LangSmith tracing plugin for Temporal workflows#1369

Open
xumaple wants to merge 43 commits intomainfrom
maplexu/langsmith-plugin
Open

Add LangSmith tracing plugin for Temporal workflows#1369
xumaple wants to merge 43 commits intomainfrom
maplexu/langsmith-plugin

Conversation

@xumaple
Copy link
Copy Markdown

@xumaple xumaple commented Mar 17, 2026

Summary

  • Adds temporalio.contrib.langsmith plugin that creates LangSmith trace hierarchies for Temporal operations (workflows, activities, signals, queries, updates, child workflows, Nexus)
  • Supports ambient @traceable context propagation through Temporal headers, replay-safe tracing, and an add_temporal_runs toggle for lightweight context-only mode
  • 48 tests covering unit, integration, and comprehensive end-to-end scenarios

🤖 Generated with Claude Code

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Mar 17, 2026

CLA assistant check
All committers have signed the CLA.

@xumaple xumaple force-pushed the maplexu/langsmith-plugin branch 2 times, most recently from 2803b95 to 768ac70 Compare March 17, 2026 17:22
xumaple and others added 20 commits March 30, 2026 16:09
Implements a LangSmith contrib plugin that creates trace hierarchies
for Temporal operations (workflows, activities, signals, queries,
updates, child workflows, Nexus). Supports ambient @Traceable context
propagation, replay-safe tracing, and an add_temporal_runs toggle for
lightweight context-only mode.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…late

- Add ReplaySafeRunTree wrapper that handles replay skipping and sandbox
  safety (post/end/patch no-op during replay, sandbox_unrestricted in
  workflow context), inspired by OTel plugin's _ReplaySafeSpan pattern
- Add config.maybe_run() to eliminate repeated config kwargs at every
  call site
- Add _traced_call (client outbound) and _traced_outbound (workflow
  outbound) helpers to reduce interceptor methods to one-liners
- Fold _extract_context into _workflow_maybe_run for workflow inbound
- Remove _safe_post, _safe_patch helpers (internalized in wrapper)
- Remove in_workflow parameter from _maybe_run (wrapper detects it)
- Establish consistent wrapping invariant: all run references are
  ReplaySafeRunTree, unwrapping is unconditional ._run at RunTree
  constructor boundary
- Parametrize redundant unit tests (client outbound, workflow
  inbound/outbound) and remove duplicate test
- Remove _make_interceptor test helper, use LangSmithInterceptor directly
- Collapse plugin constructor tests into one, add comprehensive plugin
  integration test, remove redundant sandbox tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fix ruff I001 import sorting violations in _interceptor.py and
test_integration.py. Extract _get_current_run_safe() helper for
reading ambient LangSmith context with replay safety.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Change add_temporal_runs default to False in both plugin and
  interceptor (reviewer preference for opt-in behavior)
- Rename plugin to langchain.LangSmithPlugin per organization.PluginName
  convention
- Prefix header key with _temporal- to avoid collisions
- Update all tests to explicitly pass add_temporal_runs=True

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add @Traceable call (outer_chain) directly in ComprehensiveWorkflow
  to test non-deterministic tracing alongside deterministic replay
- Set max_cached_workflows=0 on all test workers to force replay on
  every workflow task, exposing header non-determinism
- Restructure comprehensive tests with mid-workflow worker restart:
  one shared collector across two worker lifetimes proves context
  propagates via headers, not cached plugin state
- Add is_waiting_for_signal query and poll helper for deterministic
  sync (no arbitrary sleeps)
- Consolidate make_mock_ls_client in conftest.py, remove unused
  fixtures, use raw client for polling to avoid trace contamination
- Tests are expected to fail (TDD): sandbox blocks @Traceable in
  workflows, max_cached_workflows=0 exposes outputs=None on eviction

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move RunTree.post()/patch() I/O off the workflow task thread to a
single-worker ThreadPoolExecutor, preventing deadlocks from
compressed_traces.lock contention with the LangSmith drain thread.

Key changes:
- _ReplaySafeRunTree.create_child() override propagates replay safety
  and deterministic IDs to nested @langsmith.traceable calls
- Executor-backed post()/patch() with FIFO ordering and fire-and-forget
  error logging via Future.add_done_callback
- _ContextBridgeRunTree for add_temporal_runs=False without external
  context — invisible parent that produces root @Traceable runs
- aio_to_thread patch simplified: removed harmful replay-time tracing
  disable, added error gate for async @Traceable without plugin
- Plugin shutdown via SimplePlugin.run_context instead of dead method
- Fix misleading comments referencing test artifacts instead of
  production reasons, remove OTel cross-references
- Strict dump_runs catches dangling parent_run_id references
- Add **/CLAUDE.md to .gitignore

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace ~35 Any annotations across _plugin.py and _interceptor.py with
precise types (langsmith.Client, RunTree, _ReplaySafeRunTree, specific
SDK interceptor input types, etc.). Add _InputWithHeaders Protocol for
private helpers matching the OTel interceptor pattern. Narrow return
types to match base class signatures exactly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Prefix unused mock parameters with underscore (_args, _kwargs) and
rename unused variable (_collector) to satisfy basedpyright's
reportUnusedParameter and reportUnusedVariable checks.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove useless _get_current_run_safe wrapper (inline get_current_run_tree)
- Restore generic type params on interceptor return types (ActivityHandle[Any],
  ChildWorkflowHandle[Any, Any]) to match base class exactly
- Fix _make_bridge return type (Any → _ContextBridgeRunTree)
- Fix _poll_query helper types (Any → WorkflowHandle, Callable)
- Strengthen weak assertions in mixed sync/async integration tests
- Add _InputWithHeaders Protocol for private helper input params

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wrap all 5 activity definitions with @Traceable as outer decorator to
test LangSmith tracing through the full activity execution path. Update
all 9 expected trace hierarchies to account for the additional @Traceable
run nested under each RunActivity. Fix outputs assertion to only check
interceptor runs (colon-prefixed names) since @Traceable captures actual
return values rather than the interceptor's {'status': 'ok'}.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Bug 1: Replace stale _current_run snapshot with ambient context in
outbound interceptor. Add _get_current_run_for_propagation() helper
that filters _ContextBridgeRunTree from ambient context. Outbound
methods now read get_current_run_tree() for @Traceable nesting instead
of a frozen reference from workflow entry.

Bug 2: Add tracing_context() to Nexus inbound interceptor for both
execute_nexus_operation_start and execute_nexus_operation_cancel,
matching the activity inbound pattern. Ensures @Traceable functions
in Nexus handlers have a LangSmith client even with add_temporal_runs=False.

Remove handler suppression (is_handler check, _workflow_is_active flag)
to align with OTel interceptor which creates spans for all handlers
unconditionally.

Add dump_traces() to test infrastructure for per-root-trace assertions.
Restructure comprehensive tests so user_pipeline only wraps start_workflow,
with polling/signals/queries as independent root traces.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Built-in queries like __temporal_workflow_metadata, __stack_trace, and
__enhanced_stack_trace are fired automatically by infrastructure (e.g.
the Temporal Web UI) and are not user-facing. Filter them out of
LangSmith traces when add_temporal_runs=True to reduce noise.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@xumaple xumaple force-pushed the maplexu/langsmith-plugin branch from f11f660 to d9fb85a Compare March 30, 2026 20:11
xumaple and others added 7 commits April 3, 2026 13:08
…s workers

Previously, all workers sharing a LangSmithPlugin used the same
LangSmithInterceptor (and its ThreadPoolExecutor). Now each worker gets
its own interceptor via a factory in configure_worker, while client
interception uses a shared wrapper that only implements client.Interceptor
to avoid being pulled into workers by _init_from_config.

Also removes the sync fallback from _submit (formerly _submit_or_fallback)
so executor-after-shutdown errors surface immediately instead of silently
degrading.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…eRunTree

executor.submit() is not blocked by the workflow sandbox, so the
sandbox_unrestricted context manager around _submit calls in post()
and patch() was unnecessary. Removes the wrappers and corresponding
unit test assertions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The old name was misleading — it doesn't bridge contexts. It's a
factory that sits in the LangSmith tracing context as a placeholder
parent so @Traceable can call create_child(), producing independent
root _ReplaySafeRunTree instances with no parent link.

Also removes unnecessary sandbox_unrestricted from post/patch since
executor.submit() is not blocked by the workflow sandbox.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rename manually constructed dicts to more descriptive names:
- kwargs → run_tree_args (used to build RunTree instances)
- ctx_kwargs → tracing_args (used to build tracing_context calls)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- _extract_context / _extract_nexus_context now accept ls_client and
  return fully-formed parents, eliminating 4 call-site fix-ups
- Remove unnecessary _ReplaySafeRunTree unwrap in _make_run — RunTree
  only accesses .id/.dotted_order/.trace_id which delegate transparently
- Simplify tracing_args construction by always including project_name
  and parent (tracing_context treats None same as absent)
- Clean up _workflow_maybe_run: eliminate intermediate factory/
  tracing_parent variables with single conditional expression

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…h traces

StartFoo completes instantly while RunFoo runs for the operation's
lifetime, making the parent-child timing misleading in the UI. Now
headers carry the ambient parent's context instead of StartFoo's, so
RunFoo nests under the same parent as StartFoo.

Adds _traced_start for client outbound start operations (separate from
_traced_call used by query/signal/update which keep parent-child).
Workflow outbound _traced_outbound captures ambient context before
maybe_run for all operations.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Covers quick start, example chatbot, add_temporal_runs toggle,
where @Traceable works, migration guide, replay safety, and
context propagation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new temporalio.contrib.langsmith integration that traces Temporal client/worker operations into LangSmith, with replay-safe context propagation through Temporal headers and an add_temporal_runs toggle to include/exclude Temporal operation nodes.

Changes:

  • Added LangSmithPlugin + LangSmithInterceptor to emit LangSmith run hierarchies for workflows, activities, signals/queries/updates, child workflows, and Nexus operations.
  • Implemented replay-safe tracing via deterministic IDs, workflow-safe time usage, and background-thread I/O for LangSmith post/patch.
  • Added extensive unit/integration/E2E tests plus documentation for the new contrib package.

Reviewed changes

Copilot reviewed 10 out of 13 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
temporalio/contrib/langsmith/_interceptor.py Core tracing + propagation logic, replay-safe run wrappers, workflow/activity/client/Nexus interceptors.
temporalio/contrib/langsmith/_plugin.py Plugin wiring, sandbox passthrough configuration, and run context flushing.
temporalio/contrib/langsmith/__init__.py Public exports for the contrib package.
temporalio/contrib/langsmith/README.md User-facing documentation and usage examples for the LangSmith integration.
tests/contrib/langsmith/conftest.py In-memory LangSmith client/run collector helpers for tests.
tests/contrib/langsmith/test_interceptor.py Unit tests for interceptor behavior (propagation, replay safety, toggles, Nexus).
tests/contrib/langsmith/test_integration.py Integration/E2E tests against a real Temporal worker and Nexus operations.
tests/contrib/langsmith/test_plugin.py Plugin construction and end-to-end plugin wiring tests.
tests/contrib/langsmith/test_background_io.py Unit tests for executor-backed post/patch, replay suppression, and factory behavior.
pyproject.toml Adds langsmith to dev dependencies for running tests.
.gitignore Ignores CLAUDE.md files.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

@DABH DABH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not too many comments beyond what's already been said but a couple questions/comments for your consideration

Comment on lines +67 to +71
ls_headers = run_tree.to_headers()
return {
**headers,
HEADER_KEY: _payload_converter.to_payloads([ls_headers])[0],
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any concerns around header size limits? Can we add a comment or something indicating what the expected size of the header would be here? I know Temporal has some header/payload size limits

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LangSmith header is typically ~1-3KB (a dotted-order string + URL-encoded baggage with metadata/tags/project name). Headers bypass the SDK's PayloadLimitsConfig checks — they only count toward the overall gRPC message size limit (~4MB). Even stacked with OTel headers, total header overhead is <4KB. Not a practical concern unless a user attaches unusually large metadata dicts. Happy to add a docstring note about the expected size if you'd like, but I don't think it's necessary and is moreso an implementation detail of langsmith internals.

@@ -0,0 +1,113 @@
"""Shared test helpers for LangSmith plugin tests."""
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add test(s) when an Activity raises an exception? (Does the LangSmith run get properly ended with error status?)

What about workflow cancellation or timeout? These are important for production use since error traces are often the most valuable ones in LangSmith.

Copy link
Copy Markdown
Author

@xumaple xumaple Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See ActivityFailureWorkflow in test_integration.py for failed activity. Will add ones for workflow cancellation/timeout, as well as activity timeout/any other errors AI can come up with.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after lengthy research/consideration, I have concluded that it is not worth writing any more tests than what was already present (the activity failure ones). Reasoning: "Termination and timeout are simply not available to user code" (Tim) - and in this case, we're writing interceptor code, which categorizes as user code. So what that means is, we must treat all cancellations/timeouts/terminations the same, and convention (eg. OTEL) has them all succeeding. Now, we could add a simple test to verify this to be true, but there is additional edge cases, eg.: when a workflow is terminated, it doesn't see that as a reason to fail, but if it was terminated while executing an activity, that activity will fail and propagate an error to the workflow.

At the end of the day, the reason for stopping a workflow is not a factor which indicates whether the trace succeeds/fails, so it doesn't make sense to me to write tests which specifically set up a stopping reason and then tries to validate the trace success/error state.

Open to feedback on this reasoning.

Previously, when client=None, each make_interceptor() call created a
new langsmith.Client. This meant per-worker clients were never flushed.
Now a single client is created eagerly in __init__ and shared via the
make_interceptor closure. Also fix WorkerConfig import path for
basedpyright.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
_aio_to_thread_patched = False


def _patch_aio_to_thread() -> None:
Copy link
Copy Markdown
Author

@xumaple xumaple Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open question: This monkey patch is pretty benign/safe, since it mimics default behavior when outside of a workflow, but it is still a monkey patch. I wonder if Langchain would be able to expose some kind of official API/method for us to customize the default async->sync transition rather than us needing to rely on patching langsmith's internal code?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also to add more clarity to what this is doing. The doc comment reads Functions passed here must not perform blocking I/O. Normally, the post/patch operations which are passed into this function do perform I/O; however the integration wraps those implementations with replay-safe wrappers, which allow them to be placed on a separate ThreadPoolExecutor via this function. (So no blocking happens here by design.) See _ReplaySafeRunTree::_submit function in this file.

Add langsmith>=0.7.0 to [project.optional-dependencies] so users can
install via pip install temporalio[langsmith]. Add Installation section
to the LangSmith plugin README.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
xumaple and others added 9 commits April 7, 2026 11:50
test_constructor_requires_executor and test_constructor_stores_executor
were identical. Remove the duplicate.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove per-worker interceptor creation and the
_ClientOnlyLangSmithInterceptor wrapper. The plugin now creates one
LangSmithInterceptor shared across client and all workers, simplifying
the design. run_context flushes the client on shutdown.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add upper bound <0.8 since we monkey-patch langsmith internals
(aio_to_thread). This controls upgrades so internal changes in a new
minor don't silently break the plugin.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Simplify README first sentence, capitalize Temporal abstractions,
  use @Traceable consistently, update StartFoo/RunFoo explanation,
  add signals/updates to @Traceable table, add LangSmith docs link
- Rename plugin params metadata/tags to default_metadata/default_tags
  to match LangSmithInterceptor API
- Rename _session to _run_with_trace in README example

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Simplify README: merge install into Quick Start, capitalize Temporal
  abstractions, use @Traceable consistently, update StartFoo/RunFoo
  explanation, add signals/updates to @Traceable table, add LangSmith
  docs link, rename _session to _run_with_trace
- Rename plugin params metadata/tags to default_metadata/default_tags
  to match LangSmithInterceptor API

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
SimpleNexusWorkflow was nearly identical to TraceableActivityWorkflow.
Merge them by adding an optional _input param to
TraceableActivityWorkflow and updating NexusService to use it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Merge SimpleNexusWorkflow into TraceableActivityWorkflow to remove
duplication. Add my_unvalidated_update handler to ComprehensiveWorkflow
and verify that ValidateUpdate traces are only created when a validator
is defined.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace 14 repeated list comprehensions filtering traces by root name
with a shared find_traces() helper in conftest.py.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace screenshot placeholders with actual LangSmith and Temporal UI
images. Update trace example labels (Request -> Query) and improve
Worker crash example description.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants