Skip to content

feat(client): add experiment runner context#795

Open
wochinge wants to merge 4 commits intomainfrom
tobias/lfe-9365-implement-js-sdk-changes-runnercontext
Open

feat(client): add experiment runner context#795
wochinge wants to merge 4 commits intomainfrom
tobias/lfe-9365-implement-js-sdk-changes-runnercontext

Conversation

@wochinge
Copy link
Copy Markdown
Contributor

@wochinge wochinge commented Apr 23, 2026

Summary

Add a JS SDK RunnerContext wrapper for experiment-action style flows, plus a RegressionError export for CI regression signaling.

Disclaimer: Experimental PR review

Greptile Summary

This PR adds a RunnerContext class that wraps langfuse.experiment.run with CI-injected defaults (data, datasetVersion, metadata), and a RegressionError class for signaling CI gate failures — the JS equivalent of the Python SDK's RunnerContext.

  • P1 – Incorrect message when threshold is omitted: RegressionErrorMetricOptions declares threshold as optional, but the structured message template always interpolates it, producing \"(threshold undefined)\" when only metric and value are supplied.

Confidence Score: 3/5

One P1 bug produces a misleading error message for a valid, type-safe usage of RegressionError; fix is a one-line template-literal change before merging.

A single P1 finding with a straightforward fix caps confidence at 4; the bug is in a public-facing API surface (RegressionError.message) and has no existing test coverage for the affected path, pulling the score to 3.

packages/client/src/experiment/RunnerContext.ts — threshold interpolation in RegressionError constructor

Important Files Changed

Filename Overview
packages/client/src/experiment/RunnerContext.ts New RunnerContext wrapper and RegressionError class; threshold rendering bug when threshold is undefined in the metric message path.
packages/client/src/experiment/RunnerContext.test.ts Comprehensive tests for RunnerContext and RegressionError; missing coverage for metric+value without threshold case.
packages/client/src/index.ts Adds re-export of RunnerContext.js; correctly placed in the experiment exports block.

Sequence Diagram

sequenceDiagram
    participant CI as CI / GitHub Action
    participant RC as RunnerContext
    participant EM as LangfuseClient.experiment
    participant LF as Langfuse API

    CI->>RC: new RunnerContext({ client, data, datasetVersion, metadata })
    CI->>RC: runExperiment({ name, task, [overrides] })
    RC->>RC: resolve data (call ?? ctx)
    RC->>RC: merge metadata (ctx + call, call wins)
    RC->>EM: experiment.run({ ...merged params })
    EM->>LF: create dataset run items, scores
    LF-->>EM: ExperimentResult
    EM-->>RC: ExperimentResult
    RC-->>CI: ExperimentResult

    alt regression detected
        CI->>CI: throw new RegressionError({ result, metric, value, threshold? })
    end
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: packages/client/src/experiment/RunnerContext.ts
Line: 151-154

Comment:
**`threshold: undefined` renders literally in the message**

`RegressionErrorMetricOptions` marks `threshold` as optional (`threshold?: number`). When a caller provides `metric` and `value` but omits `threshold`, the condition `metric !== undefined && value !== undefined` is `true`, so the template literal fires — but `threshold` is `undefined`, producing the string `"Regression on \`accuracy\`: 0.78 (threshold undefined)"`. No test covers the `metric + value` without `threshold` path, so this is undetected.

```suggestion
      message ??
        (metric !== undefined && value !== undefined
          ? `Regression on \`${metric}\`: ${value}${threshold !== undefined ? ` (threshold ${threshold})` : ""}`
          : "Experiment regression detected"),
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "Merge branch 'main' into tobias/lfe-9365..." | Re-trigger Greptile

Greptile also left 1 inline comment on this PR.

@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 23, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
langfuse-js Ready Ready Preview Apr 29, 2026 3:24pm

Request Review

@wochinge wochinge marked this pull request as ready for review April 29, 2026 15:07
@github-actions
Copy link
Copy Markdown

@claude review

Comment thread packages/client/src/experiment/RunnerContext.ts
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 04c50baa99

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/client/src/experiment/RunnerContext.ts
Comment thread packages/client/src/experiment/RunnerContext.ts
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prior feedback addressed in b975638; threshold suffix is now conditional and covered by a new test.

Extended reasoning...

Overview

This PR adds a RunnerContext wrapper class around langfuse.experiment.run (with CI-injected defaults for data, datasetVersion, and metadata) and a RegressionError class for signaling CI gate failures. It also adds the corresponding test file and re-exports the new module from the client package index. Scope is small (3 files, ~410 lines mostly new).

Security risks

None. The code is a pure client-side wrapper with no I/O, no auth, no crypto, no input parsing of untrusted data. metadata is shallow-merged with call-time keys winning; data is passed through unchanged to the existing experiment runner.

Level of scrutiny

Low-to-medium. This is new, additive, public API surface (@public JSDoc) intended to pair with the langfuse/experiment-action GitHub Action and mirror the Python SDK RunnerContext. The logic is straightforward (default resolution + metadata merge + error formatting), and behavior is well-covered by the new unit tests.

Other factors

The single P1/P2 finding raised by Greptile, the Codex bot, and my prior review (rendering (threshold undefined) when threshold is omitted from the structured form) was fixed in b975638 with a conditional template suffix and an accompanying omits the threshold suffix when no threshold is provided test. Type-level negative cases (@ts-expect-error) are also exercised. No outstanding review threads remain unresolved.

@wochinge wochinge requested a review from hassiebp April 29, 2026 16:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant