⚡️ Speed up method `TestResults.timing_coefficient_of_variation` by 338% in PR #1949 (`cf-1082-benchmark-noise-floor`) by codeflash-ai[bot] · Pull Request #1955 · codeflash-ai/codeflash

codeflash-ai · 2026-04-01T17:51:15Z

⚡️ This pull request contains optimizations for PR #1949

If you approve this dependent PR, these changes will be merged into the original PR branch cf-1082-benchmark-noise-floor.

This PR will be automatically closed if the original PR is merged.

📄 338% (3.38x) speedup for `TestResults.timing_coefficient_of_variation` in `codeflash/models/models.py`

⏱️ Runtime : 245 microseconds → 55.8 microseconds (best of 250 runs)

📝 Explanation and details

The hot-path timing_coefficient_of_variation() was replaced with Welford's single-pass algorithm to compute sample standard deviation and mean in one traversal instead of calling statistics.mean() and statistics.stdev() separately (which each iterate the list). Line profiler shows the original's statistics.stdev() consumed 47.6% of function runtime; the new _compute_sample_cv cuts that to 16.2% by eliminating redundant passes and reducing overhead from Python's general-purpose statistics module. Overall runtime drops 77% (245 µs → 55.8 µs), a key speedup in process_single_candidate where this method gates candidate evaluation.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	✅ 76 Passed
🌀 Generated Regression Tests	✅ 4 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	80.0%

⚙️ Click to see Existing Unit Tests

Test File::Test Function	Original ⏱️	Optimized ⏱️	Speedup
`test_critic.py::test_timing_coefficient_of_variation`	85.3μs	16.4μs	419%✅
`test_critic.py::test_timing_cv_multi_test_case`	123μs	16.3μs	656%✅

🌀 Click to see Generated Regression Tests

# imports
from pathlib import Path

from codeflash.models.models import FunctionTestInvocation, InvocationId, TestResults
from codeflash.models.test_type import TestType


def test_timing_coefficient_of_variation_empty_test_results():
    test_results = TestResults()
    assert test_results.timing_coefficient_of_variation() == 0.0  # 1.57μs -> 1.57μs (0.000% faster)


def test_timing_coefficient_of_variation_only_none_runtimes():
    invocation_id1 = InvocationId(
        test_module_path="test_module",
        test_class_name=None,
        test_function_name="test_test1",
        function_getting_tested="func_under_test",
        iteration_id="1",
    )
    inv1 = FunctionTestInvocation(
        id=invocation_id1,
        loop_index=1,
        test_type=TestType.GENERATED_REGRESSION,
        verification_type="function_call",
        test_framework="pytest",
        runtime=None,
        did_pass=True,
        file_name=Path("test.py"),
        return_value=42,
        timed_out=False,
    )
    invocation_id2 = InvocationId(
        test_module_path="test_module",
        test_class_name=None,
        test_function_name="test_test2",
        function_getting_tested="func_under_test",
        iteration_id="1",
    )
    inv2 = FunctionTestInvocation(
        id=invocation_id2,
        loop_index=1,
        test_type=TestType.GENERATED_REGRESSION,
        verification_type="function_call",
        test_framework="pytest",
        runtime=None,
        did_pass=True,
        file_name=Path("test.py"),
        return_value=42,
        timed_out=False,
    )
    test_results = TestResults(test_results=[inv1, inv2])
    assert test_results.timing_coefficient_of_variation() == 0.0  # 17.1μs -> 17.3μs (0.875% slower)

To edit these changes git checkout codeflash/optimize-pr1949-2026-04-01T17.51.09 and push.

The hot-path `timing_coefficient_of_variation()` was replaced with Welford's single-pass algorithm to compute sample standard deviation and mean in one traversal instead of calling `statistics.mean()` and `statistics.stdev()` separately (which each iterate the list). Line profiler shows the original's `statistics.stdev()` consumed 47.6% of function runtime; the new `_compute_sample_cv` cuts that to 16.2% by eliminating redundant passes and reducing overhead from Python's general-purpose statistics module. Overall runtime drops 77% (245 µs → 55.8 µs), a key speedup in `process_single_candidate` where this method gates candidate evaluation.

claude · 2026-04-01T17:52:00Z

Claude finished @codeflash-ai[bot]'s task in 5m 8s —— View job

PR Review Summary

Prek Checks

✅ Passes. No issues introduced by this PR. (14 pre-existing mypy errors in models.py unrelated to these changes.)

Code Review

Correctness: The Welford single-pass algorithm is implemented correctly:

n < 2 guard mirrors the original len(runtimes) < 2 check
mean == 0.0 guard mirrors the original mean == 0 check
Uses m2 / (n - 1) (sample variance) consistent with statistics.stdev (which also uses n-1)
The sample_variance <= 0.0 guard correctly handles floating-point rounding artefacts
statistics.median is still used for the final aggregation (local import statistics at line 992 is still needed)
import math added at top-level for math.sqrt — correct

Style fix applied: _compute_sample_cv was renamed to compute_sample_cv per the project naming convention (CLAUDE.md: "NEVER use leading underscores"). Committed and pushed.

No bugs, security issues, or breaking API changes found.

Duplicate Detection

No duplicates detected. compute_sample_cv is a unique local helper.

Other open codeflash-ai PRs

PR ⚡️ Speed up method TestResults.total_passed_runtime by 20% in PR #1949 (cf-1082-benchmark-noise-floor) #1954 (total_passed_runtime optimization, same base branch): CI still pending; the unit-tests (windows-latest, 3.13) failure is pre-existing on the base branch (also failing on PR fix: use median timing and variance-aware noise floor to reduce benchmark false positives #1949).
PR ⚡️ Speed up function fmt_delta by 11% in PR #1941 (cf-compare-copy-benchmarks) #1943 (fmt_delta optimization, cf-compare-copy-benchmarks): Multiple CI failures, but the code change (string formatting via % operator instead of f-string) cannot cause end-to-end optimization test failures — these are pre-existing on the base branch. Leaving open.

Last updated: 2026-04-01T17:57Z

…entions

codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Apr 1, 2026

codeflash-ai bot mentioned this pull request Apr 1, 2026

fix: use median timing and variance-aware noise floor to reduce benchmark false positives #1949

Open

style: rename _compute_sample_cv to compute_sample_cv per naming conv…

319e8dd

…entions

claude bot mentioned this pull request Apr 1, 2026

⚡️ Speed up function speedup_critic by 10% in PR #1949 (cf-1082-benchmark-noise-floor) #1956

Merged

claude bot merged commit fc9a6b2 into cf-1082-benchmark-noise-floor Apr 1, 2026
24 of 26 checks passed

claude bot deleted the codeflash/optimize-pr1949-2026-04-01T17.51.09 branch April 1, 2026 18:12

claude bot mentioned this pull request Apr 1, 2026

fix: Add .js extensions to relative imports in ESM TypeScript projects #1957

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

⚡️ Speed up method `TestResults.timing_coefficient_of_variation` by 338% in PR #1949 (`cf-1082-benchmark-noise-floor`)#1955

⚡️ Speed up method `TestResults.timing_coefficient_of_variation` by 338% in PR #1949 (`cf-1082-benchmark-noise-floor`)#1955
claude[bot] merged 2 commits intocf-1082-benchmark-noise-floorfrom
codeflash/optimize-pr1949-2026-04-01T17.51.09

codeflash-ai bot commented Apr 1, 2026

Uh oh!

claude bot commented Apr 1, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Conversation

codeflash-ai bot commented Apr 1, 2026

⚡️ This pull request contains optimizations for PR #1949

📄 338% (3.38x) speedup for TestResults.timing_coefficient_of_variation in codeflash/models/models.py

📝 Explanation and details

Uh oh!

claude bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Summary

Prek Checks

Code Review

Duplicate Detection

Other open codeflash-ai PRs

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

📄 338% (3.38x) speedup for `TestResults.timing_coefficient_of_variation` in `codeflash/models/models.py`

claude bot commented Apr 1, 2026 •

edited

Loading