⚡️ Speed up function `validate_and_format_benchmark_table` by 45% in PR #1941 (`cf-compare-copy-benchmarks`) by codeflash-ai[bot] · Pull Request #1975 · codeflash-ai/codeflash

codeflash-ai · 2026-04-02T18:51:02Z

⚡️ This pull request contains optimizations for PR #1941

If you approve this dependent PR, these changes will be merged into the original PR branch cf-compare-copy-benchmarks.

This PR will be automatically closed if the original PR is merged.

📄 45% (0.45x) speedup for `validate_and_format_benchmark_table` in `codeflash/benchmarking/utils.py`

⏱️ Runtime : 1.26 milliseconds → 869 microseconds (best of 5 runs)

📝 Explanation and details

The hot path shows logger.debug consuming 18.3% of original runtime despite appearing infrequently (141 hits), because formatting the f-string occurs unconditionally even when debug logging is disabled. Wrapping it with logger.isEnabledFor(logging.DEBUG) defers string construction until confirmed necessary, eliminating wasteful formatting. Replacing lambda x: x[3] with operator.itemgetter(3) in the sort key reduces per-comparison overhead from a Python function call to a C-level attribute access, and hoisting the division constant 1_000_000.0 outside the loop avoids repeated float literal construction. Line profiler confirms the sort line dropped from 568 µs to 197 µs (65% faster) and the debug call from 1102 µs to 124 µs (89% faster), yielding a 45% overall speedup with no correctness or metric trade-offs.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	✅ 9 Passed
🌀 Generated Regression Tests	✅ 34 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	100.0%

⚙️ Click to see Existing Unit Tests

Test File::Test Function	Original ⏱️	Optimized ⏱️	Speedup
`test_trace_benchmarks.py::test_trace_benchmark_decorator`	6.49μs	6.49μs	0.000%✅
`test_trace_benchmarks.py::test_trace_multithreaded_benchmark`	6.20μs	5.88μs	5.44%✅

🌀 Click to see Generated Regression Tests

from typing import Dict, List, Tuple

# imports
import pytest  # used for our unit tests

# import the real function and dataclass under test
from codeflash.benchmarking.utils import validate_and_format_benchmark_table
from codeflash.models.models import BenchmarkKey


def test_basic_conversion_and_percentage():
    # Create a BenchmarkKey for a simple benchmark
    key = BenchmarkKey(module_path="module.a", function_name="fast_func")
    # The function under test expects nanoseconds. Use 500_000_000 ns (0.5s) for func,
    # and 1_000_000_000 ns (1.0s) for total -> percentage should be 50.0
    function_benchmark_timings: Dict[str, Dict[BenchmarkKey, float]] = {"package.path:func": {key: 500_000_000.0}}
    total_benchmark_timings: Dict[BenchmarkKey, float] = {key: 1_000_000_000.0}

    # Call the function under test
    result = validate_and_format_benchmark_table(
        function_benchmark_timings, total_benchmark_timings
    )  # 3.13μs -> 3.23μs (3.10% slower)

    # There should be an entry for our function path
    assert "package.path:func" in result
    entries = result["package.path:func"]

    # Only one benchmark entry expected
    assert isinstance(entries, list) and len(entries) == 1

    # Unpack the returned tuple and verify conversions: total ms, func ms, percentage
    returned_key, total_ms, func_ms, percentage = entries[0]
    assert returned_key == key  # BenchmarkKey equality by fields (dataclass)
    # nanoseconds -> milliseconds: 1_000_000_000 ns == 1000.0 ms
    assert total_ms == pytest.approx(1000.0)
    # 500_000_000 ns == 500.0 ms
    assert func_ms == pytest.approx(500.0)
    # Percentage 50%
    assert percentage == pytest.approx(50.0)


def test_skip_when_func_time_greater_than_total():
    # If the function's measured time is greater than the total reported time,
    # the implementation appends a zeroed tuple for that benchmark.
    key = BenchmarkKey(module_path="modX", function_name="weird_case")
    function_benchmark_timings = {"fpath": {key: 2_000_000_000.0}}  # 2s
    total_benchmark_timings = {key: 1_000_000_000.0}  # 1s total

    result = validate_and_format_benchmark_table(
        function_benchmark_timings, total_benchmark_timings
    )  # 9.80μs -> 3.14μs (212% faster)

    # Expect a single entry that is zeroed out
    entries = result["fpath"]
    assert len(entries) == 1
    returned_key, total_ms, func_ms, percentage = entries[0]
    assert returned_key == key
    assert total_ms == pytest.approx(0.0)
    assert func_ms == pytest.approx(0.0)
    assert percentage == pytest.approx(0.0)


def test_missing_total_and_zero_func_time_results_in_omission():
    # If total isn't provided (defaults to 0) and function time is exactly 0,
    # the code does not append any tuple for that benchmark.
    key = BenchmarkKey(module_path="m", function_name="zero_case")
    function_benchmark_timings = {"only_func": {key: 0.0}}
    total_benchmark_timings = {}  # missing key -> treated as total_time == 0

    result = validate_and_format_benchmark_table(
        function_benchmark_timings, total_benchmark_timings
    )  # 2.35μs -> 2.67μs (11.7% slower)

    # The function entry should exist with an empty list (no benchmarks appended)
    assert "only_func" in result
    assert isinstance(result["only_func"], list)
    assert result["only_func"] == []


def test_multiple_benchmarks_sorted_descending_percentage():
    # Create two benchmarks with differing percentages and ensure the output is sorted
    k_low = BenchmarkKey(module_path="pkg", function_name="low")
    k_high = BenchmarkKey(module_path="pkg", function_name="high")
    # low: 100_000_000 / 1_000_000_000 => 10%
    # high: 500_000_000 / 1_000_000_000 => 50%
    function_benchmark_timings = {"multi_func": {k_low: 100_000_000.0, k_high: 500_000_000.0}}
    total_benchmark_timings = {k_low: 1_000_000_000.0, k_high: 1_000_000_000.0}

    result = validate_and_format_benchmark_table(
        function_benchmark_timings, total_benchmark_timings
    )  # 4.14μs -> 4.00μs (3.53% faster)
    entries = result["multi_func"]

    # Ensure two entries present and sorted descending by percentage (highest first)
    assert len(entries) == 2
    first = entries[0]
    second = entries[1]
    # first should correspond to k_high (50%), second to k_low (10%)
    assert first[0] == k_high
    assert pytest.approx(first[3]) == 50.0
    assert second[0] == k_low
    assert pytest.approx(second[3]) == 10.0


def test_total_zero_with_nonzero_func_appends_zeroed_entry():
    # If total_time is explicitly zero and func_time > total_time (0),
    # the implementation will append a zeroed tuple (skip scenario).
    key = BenchmarkKey(module_path="edge", function_name="total_zero")
    function_benchmark_timings = {"ef": {key: 1.0}}  # tiny non-zero func time
    total_benchmark_timings = {key: 0.0}  # explicit zero total

    result = validate_and_format_benchmark_table(
        function_benchmark_timings, total_benchmark_timings
    )  # 9.42μs -> 3.09μs (205% faster)
    entries = result["ef"]
    assert len(entries) == 1
    _, total_ms, func_ms, percentage = entries[0]
    assert total_ms == pytest.approx(0.0)
    assert func_ms == pytest.approx(0.0)
    assert percentage == pytest.approx(0.0)


def test_type_error_on_none_values_in_inputs():
    k1 = BenchmarkKey(module_path="valid", function_name="test1")
    k2 = BenchmarkKey(module_path="valid", function_name="test2")
    k3 = BenchmarkKey(module_path="valid", function_name="test3")

    function_benchmark_timings = {"func1": {k1: 250_000_000.0, k2: 750_000_000.0}, "func2": {k3: 100_000_000.0}}
    total_benchmark_timings = {k1: 1_000_000_000.0, k2: 1_000_000_000.0, k3: 1_000_000_000.0}

    result = validate_and_format_benchmark_table(
        function_benchmark_timings, total_benchmark_timings
    )  # 5.44μs -> 4.86μs (12.0% faster)

    assert "func1" in result
    assert "func2" in result
    entries_func1 = result["func1"]
    entries_func2 = result["func2"]

    assert len(entries_func1) == 2
    assert len(entries_func2) == 1

    assert entries_func1[0][0] == k2
    assert entries_func1[0][3] == pytest.approx(75.0)

    assert entries_func1[1][0] == k1
    assert entries_func1[1][3] == pytest.approx(25.0)

    assert entries_func2[0][0] == k3
    assert entries_func2[0][3] == pytest.approx(10.0)


def test_large_scale_many_functions_and_benchmarks():
    # Build a large-scale scenario: 100 functions each with 10 benchmarks -> 1000 items
    num_functions = 100
    benchmarks_per_function = 10

    function_benchmark_timings: Dict[str, Dict[BenchmarkKey, float]] = {}
    total_benchmark_timings: Dict[BenchmarkKey, float] = {}

    # Populate inputs with deterministic values so the test is deterministic
    for i in range(num_functions):
        func_path = f"module.func_{i}"
        inner: Dict[BenchmarkKey, float] = {}
        for j in range(benchmarks_per_function):
            # Create unique BenchmarkKey per function and benchmark index
            bk = BenchmarkKey(module_path=f"m{i}", function_name=f"b{j}")
            # Give increasing func times to produce different percentages
            func_time_ns = float((j + 1) * 10_000_000)  # 10ms, 20ms, ... in ns units
            total_time_ns = float((benchmarks_per_function + 1) * 10_000_000)  # common total larger than any func time
            inner[bk] = func_time_ns
            total_benchmark_timings[bk] = total_time_ns
        function_benchmark_timings[func_path] = inner

    # Call the function under test
    result = validate_and_format_benchmark_table(
        function_benchmark_timings, total_benchmark_timings
    )  # 434μs -> 378μs (14.9% faster)

    # Validate structure and counts
    assert isinstance(result, dict)
    assert len(result) == num_functions

    # Every function should have benchmarks_per_function entries
    for func_path, entries in result.items():
        assert isinstance(entries, list)
        assert len(entries) == benchmarks_per_function

        # Ensure percentages are sorted in non-increasing order for each function
        percentages: List[float] = [entry[3] for entry in entries]
        # Check monotonic non-increasing
        for a, b in zip(percentages, percentages[1:]):
            assert a >= b - 1e-12  # allow tiny floating error tolerance

    # Spot-check a couple of numeric conversions for determinism
    sample_func = "module.func_0"
    sample_entries: List[Tuple[BenchmarkKey, float, float, float]] = result[sample_func]
    # The top percentage should correspond to the largest j (since func_time increases with j)
    top_entry = sample_entries[0]
    top_key, top_total_ms, top_func_ms, top_percentage = top_entry
    # total_time_ns was (benchmarks_per_function + 1) * 10_000_000 => in ms:
    expected_total_ms = ((benchmarks_per_function + 1) * 10_000_000) / 1_000_000.0
    assert top_total_ms == pytest.approx(expected_total_ms)
    # top func time is (benchmarks_per_function) * 10_000_000 ns in ms
    expected_top_func_ms = (benchmarks_per_function * 10_000_000) / 1_000_000.0
    assert top_func_ms == pytest.approx(expected_top_func_ms)
    # percentage should be func / total * 100
    expected_percentage = (benchmarks_per_function * 10_000_000) / ((benchmarks_per_function + 1) * 10_000_000) * 100.0
    assert top_percentage == pytest.approx(expected_percentage)

# imports
import pytest

from codeflash.benchmarking.utils import validate_and_format_benchmark_table
from codeflash.models.models import BenchmarkKey


def test_empty_input_dicts():
    """Test with empty input dictionaries."""
    result = validate_and_format_benchmark_table({}, {})  # 842ns -> 861ns (2.21% slower)
    assert result == {}
    assert isinstance(result, dict)


def test_single_function_single_benchmark():
    """Test with a single function and single benchmark."""
    key = BenchmarkKey("module.py", "func")
    function_timings = {"func_path": {key: 1_000_000.0}}
    total_timings = {key: 2_000_000.0}

    result = validate_and_format_benchmark_table(function_timings, total_timings)  # 3.51μs -> 3.37μs (4.19% faster)

    assert "func_path" in result
    assert len(result["func_path"]) == 1
    assert result["func_path"][0][0] == key
    assert result["func_path"][0][1] == 2.0  # total_time_ms: 2_000_000 / 1_000_000
    assert result["func_path"][0][2] == 1.0  # func_time_ms: 1_000_000 / 1_000_000
    assert result["func_path"][0][3] == 50.0  # percentage: (1_000_000 / 2_000_000) * 100


def test_multiple_functions_multiple_benchmarks():
    """Test with multiple functions and benchmarks, sorted by percentage."""
    key1 = BenchmarkKey("module1.py", "func1")
    key2 = BenchmarkKey("module2.py", "func2")

    function_timings = {
        "func_path_1": {key1: 1_000_000.0, key2: 500_000.0},
        "func_path_2": {key1: 500_000.0, key2: 1_500_000.0},
    }
    total_timings = {key1: 2_000_000.0, key2: 2_000_000.0}

    result = validate_and_format_benchmark_table(function_timings, total_timings)  # 5.65μs -> 5.53μs (2.19% faster)

    # Check func_path_1: key1 (50%) should come before key2 (25%)
    assert len(result["func_path_1"]) == 2
    assert result["func_path_1"][0][3] == 50.0
    assert result["func_path_1"][1][3] == 25.0

    # Check func_path_2: key2 (75%) should come before key1 (25%)
    assert len(result["func_path_2"]) == 2
    assert result["func_path_2"][0][3] == 75.0
    assert result["func_path_2"][1][3] == 25.0


def test_nanoseconds_to_milliseconds_conversion():
    """Test that nanoseconds are correctly converted to milliseconds."""
    key = BenchmarkKey("module.py", "func")
    function_timings = {"func_path": {key: 5_000_000.0}}  # 5 million ns
    total_timings = {key: 10_000_000.0}  # 10 million ns

    result = validate_and_format_benchmark_table(function_timings, total_timings)  # 3.20μs -> 3.20μs (0.000% faster)

    # 5_000_000 / 1_000_000 = 5.0 ms
    # 10_000_000 / 1_000_000 = 10.0 ms
    assert result["func_path"][0][1] == 10.0
    assert result["func_path"][0][2] == 5.0


def test_percentage_calculation():
    """Test correct percentage calculation."""
    key = BenchmarkKey("module.py", "func")
    function_timings = {"func_path": {key: 250_000.0}}
    total_timings = {key: 1_000_000.0}

    result = validate_and_format_benchmark_table(function_timings, total_timings)  # 3.15μs -> 3.12μs (0.963% faster)

    # (250_000 / 1_000_000) * 100 = 25.0%
    assert result["func_path"][0][3] == 25.0


def test_sorting_by_percentage_descending():
    """Test that results are sorted by percentage in descending order."""
    key1 = BenchmarkKey("module1.py", "func1")
    key2 = BenchmarkKey("module2.py", "func2")
    key3 = BenchmarkKey("module3.py", "func3")

    function_timings = {
        "func_path": {
            key1: 300_000.0,  # 30%
            key2: 100_000.0,  # 10%
            key3: 500_000.0,  # 50%
        }
    }
    total_timings = {key1: 1_000_000.0, key2: 1_000_000.0, key3: 1_000_000.0}

    result = validate_and_format_benchmark_table(function_timings, total_timings)  # 4.74μs -> 4.64μs (2.18% faster)

    # Should be sorted: 50%, 30%, 10%
    percentages = [item[3] for item in result["func_path"]]
    assert percentages == [50.0, 30.0, 10.0]


def test_func_time_greater_than_total_time():
    """Test when func_time exceeds total_time (multithreading scenario)."""
    key = BenchmarkKey("module.py", "func")
    function_timings = {"func_path": {key: 3_000_000.0}}
    total_timings = {key: 2_000_000.0}

    result = validate_and_format_benchmark_table(function_timings, total_timings)  # 10.2μs -> 4.06μs (151% faster)

    # Should return zeros for this case
    assert result["func_path"][0] == (key, 0.0, 0.0, 0.0)


def test_total_time_zero():
    """Test when total_time is zero and other benchmarks exist."""
    key1 = BenchmarkKey("module.py", "func1")
    key2 = BenchmarkKey("module.py", "func2")
    function_timings = {"func_path": {key1: 1_000_000.0, key2: 500_000.0}}
    total_timings = {
        key1: 2_000_000.0,
        key2: 0,  # Zero total time
    }

    result = validate_and_format_benchmark_table(function_timings, total_timings)  # 10.2μs -> 4.54μs (124% faster)

    # key2 with zero total_time should be filtered out
    assert len(result["func_path"]) == 1
    assert result["func_path"][0][0] == key1
    assert result["func_path"][0][3] == 50.0


def test_missing_benchmark_key_in_total_timings():
    """Test when a benchmark key exists in function_timings but not in total_timings."""
    key1 = BenchmarkKey("module1.py", "func1")
    key2 = BenchmarkKey("module2.py", "func2")

    function_timings = {"func_path": {key1: 1_000_000.0, key2: 500_000.0}}
    total_timings = {key1: 2_000_000.0}  # key2 is missing

    result = validate_and_format_benchmark_table(function_timings, total_timings)  # 10.6μs -> 4.40μs (141% faster)

    # key2 should be treated as having total_time of 0, so it's skipped
    assert len(result["func_path"]) == 1
    assert result["func_path"][0][0] == key1


def test_very_small_percentage():
    """Test with very small percentage values."""
    key = BenchmarkKey("module.py", "func")
    function_timings = {"func_path": {key: 1_000.0}}  # 0.001 ms
    total_timings = {key: 1_000_000_000.0}  # 1000 ms

    result = validate_and_format_benchmark_table(function_timings, total_timings)  # 3.22μs -> 2.91μs (10.7% faster)

    # (1_000 / 1_000_000_000) * 100 = 0.0001%
    percentage = result["func_path"][0][3]
    assert percentage > 0
    assert percentage < 0.001


def test_very_large_percentage():
    """Test with percentage close to 100."""
    key = BenchmarkKey("module.py", "func")
    function_timings = {"func_path": {key: 999_000_000.0}}
    total_timings = {key: 1_000_000_000.0}

    result = validate_and_format_benchmark_table(function_timings, total_timings)  # 3.36μs -> 3.04μs (10.5% faster)

    percentage = result["func_path"][0][3]
    assert percentage == pytest.approx(99.9, rel=1e-5)


def test_equal_func_and_total_time():
    """Test when func_time equals total_time (100% percentage)."""
    key = BenchmarkKey("module.py", "func")
    function_timings = {"func_path": {key: 5_000_000.0}}
    total_timings = {key: 5_000_000.0}

    result = validate_and_format_benchmark_table(function_timings, total_timings)  # 3.10μs -> 3.04μs (1.98% faster)

    assert result["func_path"][0][3] == 100.0


def test_zero_func_time():
    """Test when func_time is zero."""
    key = BenchmarkKey("module.py", "func")
    function_timings = {"func_path": {key: 0.0}}
    total_timings = {key: 1_000_000.0}

    result = validate_and_format_benchmark_table(function_timings, total_timings)  # 3.16μs -> 2.94μs (7.53% faster)

    # Should have 0% percentage
    assert result["func_path"][0][3] == 0.0
    assert result["func_path"][0][2] == 0.0


def test_multiple_functions_same_name_different_modules():
    """Test functions with same name but different module paths."""
    key1 = BenchmarkKey("module1.py", "func")
    key2 = BenchmarkKey("module2.py", "func")

    function_timings = {"func_path_1": {key1: 1_000_000.0}, "func_path_2": {key2: 1_000_000.0}}
    total_timings = {key1: 2_000_000.0, key2: 2_000_000.0}

    result = validate_and_format_benchmark_table(function_timings, total_timings)  # 4.67μs -> 4.28μs (9.14% faster)

    assert len(result) == 2
    assert result["func_path_1"][0][0] == key1
    assert result["func_path_2"][0][0] == key2


def test_nanoseconds_boundary_values():
    """Test with various nanosecond boundary values."""
    key = BenchmarkKey("module.py", "func")
    function_timings = {"func_path": {key: 1_000_000.0}}
    total_timings = {key: 1_000_000.0}

    result = validate_and_format_benchmark_table(function_timings, total_timings)  # 3.18μs -> 3.08μs (3.28% faster)

    # 1_000_000 / 1_000_000 = 1.0 ms
    assert result["func_path"][0][1] == 1.0
    assert result["func_path"][0][2] == 1.0


def test_mixed_valid_and_invalid_benchmarks():
    """Test mixture of valid benchmarks and those exceeding total time."""
    key1 = BenchmarkKey("module1.py", "func1")
    key2 = BenchmarkKey("module2.py", "func2")

    function_timings = {
        "func_path": {
            key1: 1_000_000.0,  # Valid: 50%
            key2: 3_000_000.0,  # Invalid: exceeds total
        }
    }
    total_timings = {key1: 2_000_000.0, key2: 2_000_000.0}

    result = validate_and_format_benchmark_table(function_timings, total_timings)  # 11.2μs -> 4.63μs (141% faster)

    # Should contain both, with zeros for the invalid one
    assert len(result["func_path"]) == 2
    # After sorting, the 50% should come first, then the 0% case
    assert result["func_path"][0][3] == 50.0
    assert result["func_path"][1][3] == 0.0


def test_function_path_with_special_characters():
    """Test function paths with special characters."""
    key = BenchmarkKey("module.py", "func")
    function_timings = {"path/to/func_123-test": {key: 1_000_000.0}}
    total_timings = {key: 2_000_000.0}

    result = validate_and_format_benchmark_table(function_timings, total_timings)  # 3.26μs -> 3.23μs (0.930% faster)

    assert "path/to/func_123-test" in result
    assert len(result["path/to/func_123-test"]) == 1


def test_benchmark_key_with_special_characters():
    """Test BenchmarkKey with special characters in module and function names."""
    key = BenchmarkKey("test_module_v2.py", "test_func_name_123")
    function_timings = {"func_path": {key: 1_000_000.0}}
    total_timings = {key: 2_000_000.0}

    result = validate_and_format_benchmark_table(function_timings, total_timings)  # 3.13μs -> 3.04μs (2.66% faster)

    assert result["func_path"][0][0] == key


def test_float_precision_in_calculations():
    """Test floating point precision in percentage calculations."""
    key = BenchmarkKey("module.py", "func")
    # Test with values that might cause floating point precision issues
    function_timings = {"func_path": {key: 1.0}}
    total_timings = {key: 3.0}

    result = validate_and_format_benchmark_table(function_timings, total_timings)  # 3.41μs -> 3.32μs (2.71% faster)

    # (1.0 / 3.0) * 100 should be approximately 33.33333...
    percentage = result["func_path"][0][3]
    assert percentage == pytest.approx(33.33333333333333, rel=1e-10)


def test_large_number_of_functions():
    """Test with 100 different functions."""
    function_timings = {}
    total_timings = {}

    # Create 100 benchmark keys
    for i in range(100):
        key = BenchmarkKey(f"module{i}.py", f"func{i}")
        total_timings[key] = 2_000_000.0

    # Create a single function with 100 benchmarks
    benchmarks = {}
    for i in range(100):
        key = BenchmarkKey(f"module{i}.py", f"func{i}")
        benchmarks[key] = (i + 1) * 10_000.0  # Varying times

    function_timings["func_path"] = benchmarks

    result = validate_and_format_benchmark_table(function_timings, total_timings)  # 64.8μs -> 59.9μs (8.13% faster)

    assert len(result["func_path"]) == 100
    # Check that results are sorted in descending order
    percentages = [item[3] for item in result["func_path"]]
    assert percentages == sorted(percentages, reverse=True)


def test_large_number_of_function_paths():
    """Test with 100 different function paths."""
    key = BenchmarkKey("module.py", "func")
    function_timings = {}
    total_timings = {key: 2_000_000.0}

    # Create 100 function paths
    for i in range(100):
        function_timings[f"func_path_{i}"] = {key: 1_000_000.0}

    result = validate_and_format_benchmark_table(function_timings, total_timings)  # 71.1μs -> 65.2μs (9.20% faster)

    assert len(result) == 100
    # Each should have exactly one entry with 50% percentage
    for func_path in result:
        assert len(result[func_path]) == 1
        assert result[func_path][0][3] == 50.0


def test_large_number_of_benchmarks_per_function():
    """Test with 100 benchmarks for a single function."""
    function_timings = {}
    total_timings = {}

    benchmarks = {}
    for i in range(100):
        key = BenchmarkKey(f"module{i}.py", f"func{i}")
        # Vary the times to create different percentages
        func_time = (i + 1) * 1_000.0
        total_time = 100_000.0
        benchmarks[key] = func_time
        total_timings[key] = total_time

    function_timings["func_path"] = benchmarks

    result = validate_and_format_benchmark_table(function_timings, total_timings)  # 41.7μs -> 35.9μs (16.1% faster)

    assert len(result["func_path"]) == 100
    # Check sorting
    percentages = [item[3] for item in result["func_path"]]
    assert percentages == sorted(percentages, reverse=True)


def test_complex_scenario_many_functions_many_benchmarks():
    """Test complex scenario with 50 functions and 10 benchmarks each."""
    function_timings = {}
    total_timings = {}

    # Create 10 benchmark keys
    benchmark_keys = []
    for i in range(10):
        key = BenchmarkKey(f"module{i}.py", f"func{i}")
        benchmark_keys.append(key)
        total_timings[key] = 2_000_000.0

    # Create 50 function paths
    for func_idx in range(50):
        benchmarks = {}
        for bench_idx, key in enumerate(benchmark_keys):
            # Vary times based on both function and benchmark
            func_time = ((func_idx + 1) * (bench_idx + 1)) * 10_000.0
            benchmarks[key] = func_time
        function_timings[f"func_path_{func_idx}"] = benchmarks

    result = validate_and_format_benchmark_table(function_timings, total_timings)  # 475μs -> 196μs (142% faster)

    assert len(result) == 50
    for func_path in result:
        assert len(result[func_path]) == 10
        # Check sorting for each function
        percentages = [item[3] for item in result[func_path]]
        assert percentages == sorted(percentages, reverse=True)


def test_large_nanosecond_values():
    """Test with very large nanosecond values (simulating real benchmarks)."""
    key = BenchmarkKey("module.py", "func")
    # Simulate realistic benchmark times in nanoseconds (seconds converted to ns)
    function_timings = {"func_path": {key: 1_500_000_000_000.0}}  # 1.5 seconds in ns
    total_timings = {key: 3_000_000_000_000.0}  # 3 seconds in ns

    result = validate_and_format_benchmark_table(function_timings, total_timings)  # 3.30μs -> 3.13μs (5.44% faster)

    # Check conversions: 1.5e12 / 1e6 = 1.5e6 ms, 3e12 / 1e6 = 3e6 ms
    assert result["func_path"][0][1] == pytest.approx(3_000_000.0)
    assert result["func_path"][0][2] == pytest.approx(1_500_000.0)
    assert result["func_path"][0][3] == 50.0


def test_stability_with_identical_percentages():
    """Test stability when multiple benchmarks have identical percentages."""
    key1 = BenchmarkKey("module1.py", "func1")
    key2 = BenchmarkKey("module2.py", "func2")
    key3 = BenchmarkKey("module3.py", "func3")

    function_timings = {
        "func_path": {
            key1: 500_000.0,  # 25%
            key2: 500_000.0,  # 25%
            key3: 500_000.0,  # 25%
        }
    }
    total_timings = {key1: 2_000_000.0, key2: 2_000_000.0, key3: 2_000_000.0}

    result = validate_and_format_benchmark_table(function_timings, total_timings)  # 4.52μs -> 4.22μs (7.14% faster)

    # All should have 25% and should be present
    assert len(result["func_path"]) == 3
    percentages = [item[3] for item in result["func_path"]]
    assert all(p == 25.0 for p in percentages)


def test_many_zero_timings():
    """Test performance with many zero timings."""
    function_timings = {}
    total_timings = {}

    benchmarks = {}
    for i in range(50):
        key = BenchmarkKey(f"module{i}.py", f"func{i}")
        if i < 25:
            benchmarks[key] = 0.0
            total_timings[key] = 1_000_000.0
        else:
            benchmarks[key] = 500_000.0
            total_timings[key] = 1_000_000.0

    function_timings["func_path"] = benchmarks

    result = validate_and_format_benchmark_table(function_timings, total_timings)  # 23.1μs -> 21.2μs (8.88% faster)

    assert len(result["func_path"]) == 50
    # All with non-zero times should come before zeros
    percentages = [item[3] for item in result["func_path"]]
    assert percentages == sorted(percentages, reverse=True)


def test_mixed_scales_in_single_result():
    """Test handling of mixed time scales in nanoseconds."""
    key1 = BenchmarkKey("module1.py", "func1")
    key2 = BenchmarkKey("module2.py", "func2")

    function_timings = {
        "func_path": {
            key1: 1_000.0,  # 1 microsecond
            key2: 1_000_000_000.0,  # 1 second
        }
    }
    total_timings = {key1: 2_000.0, key2: 2_000_000_000.0}

    result = validate_and_format_benchmark_table(function_timings, total_timings)  # 3.95μs -> 3.84μs (2.89% faster)

    # Both should calculate correctly despite different scales
    assert result["func_path"][0][3] == pytest.approx(50.0)
    assert result["func_path"][1][3] == pytest.approx(50.0)

To edit these changes git checkout codeflash/optimize-pr1941-2026-04-02T18.50.56 and push.

The hot path shows `logger.debug` consuming 18.3% of original runtime despite appearing infrequently (141 hits), because formatting the f-string occurs unconditionally even when debug logging is disabled. Wrapping it with `logger.isEnabledFor(logging.DEBUG)` defers string construction until confirmed necessary, eliminating wasteful formatting. Replacing `lambda x: x[3]` with `operator.itemgetter(3)` in the sort key reduces per-comparison overhead from a Python function call to a C-level attribute access, and hoisting the division constant `1_000_000.0` outside the loop avoids repeated float literal construction. Line profiler confirms the sort line dropped from 568 µs to 197 µs (65% faster) and the debug call from 1102 µs to 124 µs (89% faster), yielding a 45% overall speedup with no correctness or metric trade-offs.

claude · 2026-04-02T18:51:36Z

Claude finished @codeflash-ai[bot]'s task in 1m 42s —— View job

PR Review Summary

Prek Checks

✅ All checks passed (ruff check, ruff format).

Code Review

Optimization correctness: ✅ Valid

The three optimizations are all sound:

logger.isEnabledFor(logging.DEBUG) guard defers f-string construction when debug logging is disabled — standard Python pattern
itemgetter(3) replaces lambda x: x[3] with a C-level callable, reducing per-comparison overhead
Hoisting scale = 1_000_000.0 outside the loop avoids repeated float literal construction

Minor issue fixed: The generated code introduced duplicate comments — # Process each function's benchmark data appeared twice (lines 24 and 26) and the # If the function time is greater than total time... block appeared twice (lines 33–34 and 39–40). I've cleaned these up and pushed the fix.

Duplicate Detection

No duplicates detected.

Test Coverage

SMALL PR — coverage step skipped per policy. The PR body confirms 9 existing unit tests pass and 34 generated regression tests pass with 100% coverage.

Fix pushed: style: remove duplicate comments introduced by optimization

codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Apr 2, 2026

codeflash-ai bot mentioned this pull request Apr 2, 2026

feat: enhance codeflash compare with memory profiling, script mode, and auto-calibration #1941

Merged

6 tasks

style: remove duplicate comments introduced by optimization

cfcbae5

claude bot merged commit ab728f7 into cf-compare-copy-benchmarks Apr 3, 2026
25 of 27 checks passed

claude bot deleted the codeflash/optimize-pr1941-2026-04-02T18.50.56 branch April 3, 2026 02:22

claude bot mentioned this pull request Apr 3, 2026

Fix false positive test discovery from substring matching #1974

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

⚡️ Speed up function `validate_and_format_benchmark_table` by 45% in PR #1941 (`cf-compare-copy-benchmarks`)#1975

⚡️ Speed up function `validate_and_format_benchmark_table` by 45% in PR #1941 (`cf-compare-copy-benchmarks`)#1975
claude[bot] merged 2 commits intocf-compare-copy-benchmarksfrom
codeflash/optimize-pr1941-2026-04-02T18.50.56

codeflash-ai bot commented Apr 2, 2026

Uh oh!

claude bot commented Apr 2, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Conversation

codeflash-ai bot commented Apr 2, 2026

⚡️ This pull request contains optimizations for PR #1941

📄 45% (0.45x) speedup for validate_and_format_benchmark_table in codeflash/benchmarking/utils.py

📝 Explanation and details

Uh oh!

claude bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Summary

Prek Checks

Code Review

Duplicate Detection

Test Coverage

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

📄 45% (0.45x) speedup for `validate_and_format_benchmark_table` in `codeflash/benchmarking/utils.py`

claude bot commented Apr 2, 2026 •

edited

Loading