fix(analyze): preserve sample indices and fix threshold truthiness#2377
Draft
ryan-arman wants to merge 1 commit intomainfrom
Draft
fix(analyze): preserve sample indices and fix threshold truthiness#2377ryan-arman wants to merge 1 commit intomainfrom
ryan-arman wants to merge 1 commit intomainfrom
Conversation
Two related fixes to the analyze test engine: 1. ``_extract_metric_values`` now returns ``(original_index, value)`` pairs so that ``sample_indices`` and ``all_affected_indices`` on a ``TestResult`` point to real conversation positions. Previously, when a sample was missing the metric (filtered out), the reported indices were offsets into the filtered list, which no longer matched the dataset. 2. ``threshold=test.max_percentage or test.min_percentage`` silently fell through to ``min_percentage`` when ``max_percentage`` was ``0.0``. Use an explicit ``is not None`` check instead. Applied to both the in-memory ``TestEngine`` and the incremental ``BatchTestEngine``. Adds regression tests for both behaviors. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Part of the analyze v2 split of #2370 (Phase 3 of 4). Standalone fixes — not dependent on PRs #2375 / #2376.
Two related fixes to the analyze test engine:
Preserve original sample indices.
TestEngine._extract_metric_valuesnow returns(original_index, value)pairs so thatsample_indicesandall_affected_indiceson the resultingTestResultpoint to real conversation positions. Previously, when a sample was missing the metric (filtered out), the reported indices were offsets into the filtered list, which no longer matched the dataset — the user was told to look at conversation 2 when the problem was really in conversation 3.Fix threshold truthiness bug.
threshold=test.max_percentage or test.min_percentagesilently fell through tomin_percentagewhenmax_percentagewas0.0. The result's reported threshold would say "0% required" instead of the configured "0% allowed." Replaced with an explicitis not Nonecheck in both the in-memoryTestEngineand the incrementalBatchTestEngine.Includes regression tests for both fixes in
tests/unit/analyze/test_testing_engine.py.Related issues
Linear Issue: OPE-1868
Towards OPE-1868
Before submitting
🤖 Generated with Claude Code