Skip to content

fix(analyze): preserve sample indices and fix threshold truthiness#2377

Draft
ryan-arman wants to merge 1 commit intomainfrom
ryan-arman/analyze-test-engine-fixes
Draft

fix(analyze): preserve sample indices and fix threshold truthiness#2377
ryan-arman wants to merge 1 commit intomainfrom
ryan-arman/analyze-test-engine-fixes

Conversation

@ryan-arman
Copy link
Copy Markdown
Contributor

Description

Part of the analyze v2 split of #2370 (Phase 3 of 4). Standalone fixes — not dependent on PRs #2375 / #2376.

Two related fixes to the analyze test engine:

  1. Preserve original sample indices. TestEngine._extract_metric_values now returns (original_index, value) pairs so that sample_indices and all_affected_indices on the resulting TestResult point to real conversation positions. Previously, when a sample was missing the metric (filtered out), the reported indices were offsets into the filtered list, which no longer matched the dataset — the user was told to look at conversation 2 when the problem was really in conversation 3.

  2. Fix threshold truthiness bug. threshold=test.max_percentage or test.min_percentage silently fell through to min_percentage when max_percentage was 0.0. The result's reported threshold would say "0% required" instead of the configured "0% allowed." Replaced with an explicit is not None check in both the in-memory TestEngine and the incremental BatchTestEngine.

Includes regression tests for both fixes in tests/unit/analyze/test_testing_engine.py.

Related issues

Linear Issue: OPE-1868
Towards OPE-1868

Before submitting

  • This PR only changes documentation. (You can ignore the following checks in that case)
  • Did you read the contributor guideline Pull Request guidelines?
  • Did you link the issue(s) related to this PR in the section above?
  • Did you add / update tests where needed?

🤖 Generated with Claude Code

Two related fixes to the analyze test engine:

1. ``_extract_metric_values`` now returns ``(original_index, value)`` pairs
   so that ``sample_indices`` and ``all_affected_indices`` on a
   ``TestResult`` point to real conversation positions. Previously, when a
   sample was missing the metric (filtered out), the reported indices were
   offsets into the filtered list, which no longer matched the dataset.

2. ``threshold=test.max_percentage or test.min_percentage`` silently fell
   through to ``min_percentage`` when ``max_percentage`` was ``0.0``. Use
   an explicit ``is not None`` check instead. Applied to both the in-memory
   ``TestEngine`` and the incremental ``BatchTestEngine``.

Adds regression tests for both behaviors.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@gitar-bot
Copy link
Copy Markdown

gitar-bot bot commented Apr 17, 2026

Reviewing your code

Gitar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant