Skip to content

feat: enhance codeflash compare with memory profiling, script mode, and auto-calibration#1941

Merged
KRRT7 merged 17 commits intomainfrom
cf-compare-copy-benchmarks
Apr 3, 2026
Merged

feat: enhance codeflash compare with memory profiling, script mode, and auto-calibration#1941
KRRT7 merged 17 commits intomainfrom
cf-compare-copy-benchmarks

Conversation

@KRRT7
Copy link
Copy Markdown
Collaborator

@KRRT7 KRRT7 commented Apr 1, 2026

Summary

Overhaul of codeflash compare with richer benchmarking, new modes, and better output:

  • Auto-calibration: pytest-benchmark-style round calibration (adaptive iteration count)
  • Auto-detect refs: Automatically detect base and head git refs from the current branch
  • Richer output: Added OPS, Max, IQR, Outliers columns matching pytest-benchmark layout
  • --output flag: Export results as markdown
  • --memory flag: Peak memory profiling via pytest-memray; supports memory-only benchmarks when no changed top-level functions are detected (e.g. class method changes)
  • --script mode: Run compare via a user-provided benchmark script
  • Worktree fix: Copy benchmarks dir into base worktree when missing
  • Pipeline integration: Extract median_ns from BenchmarkStats for the optimizer pipeline

Memory-only benchmarks

When --memory is set and no changed top-level functions are detected, compare:

  1. Creates worktrees for both refs
  2. Skips trace benchmarking (no instrumentation needed)
  3. Runs memray on both worktrees
  4. Renders memory-only results

Test plan

  • codeflash compare --memory with changed functions — timing + memory
  • codeflash compare --memory with no changed functions — memory-only output
  • codeflash compare --script runs user-provided benchmark script
  • codeflash compare without flags — behavior unchanged
  • Unit tests for CompareResult rendering, memory-only output, auto-calibration
  • prek passes

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Apr 1, 2026

Claude finished @KRRT7's task in 2s —— View job


PR Review Summary

  • Triage PR size — LARGE (1723 additions, production code across 8 files)
  • Run lint/typecheck (prek + mypy)
  • Resolve stale threads — none found
  • Code review
  • Duplicate detection
  • Test coverage
  • Check and handle codeflash-ai[bot] PRs

Prek Checks

ruff check and ruff format both pass.

mypy — 3 actionable errors in new code (26 total, most pre-existing in plugin.py):

File Line Error
cmd_compare.py:102 Incompatible types in assignment — result was first inferred as ScriptCompareResult from the script-mode branch, then reassigned as CompareResult Fix →
cmd_compare.py:113 ScriptCompareResult has no attribute base_stats — same root cause as above (mypy believes result is ScriptCompareResult here)
plugin.py:376 calibrate(func, args, kwargs) is missing type annotations for all parameters Fix →

Code Review

Test Regression (must fix)

tests/test_pickle_patcher.py::test_run_and_parse_picklepatch fails:

AssertionError: Expected 2 function calls, but got 1

The PR added DISTINCT to the SQL query (line 256) to deduplicate across rounds, but this collapses two genuinely distinct functions (bubble_sort_with_unused_socket and bubble_sort_with_used_socket) into 1 row if one of them produces no benchmark data. Only one benchmark is being executed. Needs investigation — either the DISTINCT is wrong for this assertion, or the multi-round plugin change broke one of the two benchmarks from running. Fix →

Potential Runtime Bug

plugin.py:322Path(None) if plugin used before setup():

Path(codeflash_benchmark_plugin.project_root)  # project_root = None in __init__

CodeFlashBenchmarkPlugin.__init__ sets self.project_root = None. If run_benchmark fires before setup(), this raises TypeError. Safe in the current flow but fragile — a guard or assertion would prevent silent future breakage. Fix →

Design: memray as a hard dependency

memray>=1.12 and pytest-memray>=1.7 are added to the main dependencies list (gated by sys_platform != 'win32'). These are native extension packages (~10–20 MB) that users who never pass --memory will still install. Consider moving them to an [memory] optional extra:

[project.optional-dependencies]
memory = ["memray>=1.12; sys_platform != 'win32'", "pytest-memray>=1.7; sys_platform != 'win32'"]

This keeps the core install lean and lets power users opt in. The ImportError in MemoryStats.parse_memray_results already handles missing memray gracefully, so the fallback path is already there.

pytest_new_process_memory_benchmarks.py — module-level sys.argv access

# lines 11-13 (at module level, outside if __name__ == "__main__":)
benchmarks_root = sys.argv[1]
memray_bin_dir = sys.argv[2]
memray_bin_prefix = sys.argv[3]

If this module is ever imported (e.g., by a test, an IDE, or an accidental discovery), it raises IndexError. The equivalent trace benchmarks script has the same pattern, but that doesn't make it correct. Moving these inside if __name__ == "__main__": is trivial. Fix →


Duplicate Detection

HIGH confidence — two pairs of identical functions in compare.py:

Functions Lines Body
fmt_ops / md_ops 764–772 / 775–783 Identical — both format nanoseconds as Kops/Mops/ops strings
fmt_bytes / md_bytes 839–848 / 860–869 Identical — both format byte counts as KiB/MiB/GiB/B strings

fmt_* variants are used in Rich console tables; md_* variants in markdown output. The formatting is the same in both contexts — one function per pair is enough. Fix →


Test Coverage

File Coverage
compare.py 37%
plugin.py 23%
cmd_compare.py ~0% (no unit tests)
pytest_new_process_memory_benchmarks.py ~0% (subprocess-only, expected)

The 22 unit tests in tests/test_compare.py cover the formatting and rendering paths well. The main compare_branches / compare_with_script execution paths (worktrees, tracing, memray invocation) are not covered — this is expected for integration-heavy code that requires real git repos and processes. No gaps for the logic that was added.


Optimization PRs

PR #1954 (codeflash/optimize-pr1949...) has merge conflicts but was created 2026-04-01 (1 day ago) — leaving open per the <3 days policy.


When the base ref predates the addition of benchmarks, the compare
command now copies the benchmarks directory from the working tree
so both refs can run.
@KRRT7 KRRT7 force-pushed the cf-compare-copy-benchmarks branch from f6026a1 to 7226d8c Compare April 1, 2026 12:09
KRRT7 and others added 7 commits April 1, 2026 07:54
Replaces single-shot timing with multi-round auto-calibrated benchmarking:
- Adaptive iteration count discovery (scale up until round >= min_time)
- Multi-round execution with statistical aggregation (min/median/mean/stddev)
- BenchmarkStats dataclass with outlier detection
- Rich table output with Min/Median/Mean/StdDev/Rounds/Iters columns
Running `codeflash compare` with no args now auto-detects:
- head_ref from current branch
- base_ref from PR base (via gh), repo default branch, or main/master
Matches pytest-benchmark's full statistical output in both Rich
tables and markdown.
get_benchmark_timings now returns BenchmarkStats instead of int.
The optimizer pipeline expects float (nanoseconds), so extract
median_ns at the boundary.
The optimized code replaces f-string formatting (`f"[green]{pct:+.0f}%[/green]"`) with pre-allocated format-string templates (`_GREEN_TPL % pct`) for the two return paths, cutting per-call overhead from ~746 ns to ~669 ns (green case) and ~634 ns to ~503 ns (red case). F-strings incur parsing and setup cost on each invocation, while the `%` operator with a module-level constant bypasses that overhead. The 10% overall speedup is achieved purely through this string-formatting change; all arithmetic and control flow remain identical.
…2026-04-01T14.15.33

⚡️ Speed up function `fmt_delta` by 11% in PR #1941 (`cf-compare-copy-benchmarks`)
@codeflash-ai
Copy link
Copy Markdown
Contributor

codeflash-ai bot commented Apr 1, 2026

This PR is now faster! 🚀 @claude[bot] accepted my optimizations from:

@codeflash-ai
Copy link
Copy Markdown
Contributor

codeflash-ai bot commented Apr 1, 2026

⚡️ Codeflash found optimizations for this PR

📄 12% (0.12x) speedup for md_bar in codeflash/benchmarking/compare.py

⏱️ Runtime : 380 microseconds 340 microseconds (best of 250 runs)

A new Optimization Review has been created.

🔗 Review here

Static Badge

KRRT7 added 3 commits April 2, 2026 07:24
The benchmark plugin now runs multiple rounds with calibrated
iterations. Tests need SELECT DISTINCT for row counts and must
extract median_ns from BenchmarkStats before validation.
Adds a second profiling phase using pytest-memray that runs after timing
benchmarks. Memory tables are suppressed when the delta is <1%.
When --memory is used and no changed top-level functions are detected,
skip trace benchmarking but still run memray profiling. This fixes the
class method limitation where codeflash compare couldn't profile memory
for changes in class methods (which are excluded from @codeflash_trace
instrumentation due to pickle overhead).
@KRRT7 KRRT7 changed the title feat: copy benchmarks to base worktree when missing in compare feat: add --memory flag and memory-only benchmarks to codeflash compare Apr 2, 2026
KRRT7 added 2 commits April 2, 2026 11:18
- test_trace_multithreaded_benchmark: SELECT DISTINCT collapses all 10
  threaded sorter calls to 1 row (identical metadata), change 10 → 1
- test_trace_benchmark_decorator: accept zero timing when func_time >
  total_time triggers the overflow guard in validate_and_format
Allows running arbitrary benchmark scripts on both git refs and
rendering a styled comparison table. Supports optional --memory
via memray wrapping. No codeflash config required for script mode.
Comment on lines +886 to +894
if base_mem.peak_memory_bytes == 0 and head_mem.peak_memory_bytes == 0:
return False
if base_mem.peak_memory_bytes > 0:
mem_pct = abs((head_mem.peak_memory_bytes - base_mem.peak_memory_bytes) / base_mem.peak_memory_bytes) * 100
if mem_pct > threshold_pct:
return True
if base_mem.total_allocations > 0:
alloc_pct = abs((head_mem.total_allocations - base_mem.total_allocations) / base_mem.total_allocations) * 100
if alloc_pct > threshold_pct:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚡️Codeflash found 12% (0.12x) speedup for has_meaningful_memory_change in codeflash/benchmarking/compare.py

⏱️ Runtime : 108 microseconds 96.4 microseconds (best of 157 runs)

📝 Explanation and details

The optimization hoisted repeated attribute lookups (base_mem.peak_memory_bytes, head_mem.peak_memory_bytes, base_mem.total_allocations) into local variables and replaced division-based percentage checks with algebraically equivalent cross-multiplication (abs(h_peak - b_peak) * 100.0 > threshold_pct * b_peak), eliminating one division per branch. Line profiler shows the memory percentage calculation dropped from 85.6 µs to 85.0 µs and the allocation check fell from 31.4 µs to 53.2 µs (though allocation branch latency increased slightly, the overall runtime improved 11% because hottest paths—memory checks—got faster and attribute caching saved ~13 µs across 103 invocations). Tests confirm correctness is preserved across all edge cases including None inputs, zero thresholds, and boundary conditions.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 23 Passed
🌀 Generated Regression Tests 97 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
⚙️ Click to see Existing Unit Tests
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
test_compare.py::TestHasMeaningfulMemoryChange.test_both_none 611ns 661ns -7.56%⚠️
test_compare.py::TestHasMeaningfulMemoryChange.test_both_zero 571ns 601ns -4.99%⚠️
test_compare.py::TestHasMeaningfulMemoryChange.test_no_change 1.70μs 1.62μs 4.93%✅
test_compare.py::TestHasMeaningfulMemoryChange.test_one_none 811ns 801ns 1.25%✅
test_compare.py::TestHasMeaningfulMemoryChange.test_significant_alloc_change 1.73μs 1.44μs 20.1%✅
test_compare.py::TestHasMeaningfulMemoryChange.test_significant_peak_change 1.38μs 1.28μs 7.88%✅
🌀 Click to see Generated Regression Tests
from codeflash.benchmarking.compare import has_meaningful_memory_change
from codeflash.benchmarking.plugin.plugin import MemoryStats


def test_both_none_returns_false():
    # When both base and head are None, there is no change -> expect False
    assert has_meaningful_memory_change(None, None) is False  # 861ns -> 601ns (43.3% faster)


def test_one_none_and_one_present_returns_true():
    # If exactly one of the inputs is None, that's a meaningful change -> expect True
    base = MemoryStats(peak_memory_bytes=0, total_allocations=0)
    assert has_meaningful_memory_change(base, None) is True  # 481ns -> 551ns (12.7% slower)
    assert has_meaningful_memory_change(None, base) is True  # 270ns -> 251ns (7.57% faster)


def test_zero_peaks_ignores_allocation_changes():
    # If both peak_memory_bytes are zero, the function returns False immediately
    # regardless of allocation differences
    base = MemoryStats(peak_memory_bytes=0, total_allocations=100)
    head = MemoryStats(peak_memory_bytes=0, total_allocations=200)
    # Should short-circuit on zero peaks and return False regardless of allocation delta
    assert has_meaningful_memory_change(base, head) is False  # 551ns -> 601ns (8.32% slower)


def test_mem_change_exceeds_default_threshold_is_true():
    # Default threshold_pct is 1.0. A change from 100 -> 200 is a 100% change -> True
    base = MemoryStats(peak_memory_bytes=100, total_allocations=10)
    head = MemoryStats(peak_memory_bytes=200, total_allocations=10)
    assert has_meaningful_memory_change(base, head) is True  # 1.49μs -> 1.31μs (13.7% faster)


def test_mem_change_equal_to_threshold_is_not_considered_meaningful():
    # Change exactly equal to threshold should NOT be considered meaningful because check uses '>'
    base = MemoryStats(peak_memory_bytes=100, total_allocations=10)
    # 101 is 1% greater than 100 -> mem_pct == 1.0 which equals default threshold -> expect False
    head = MemoryStats(peak_memory_bytes=101, total_allocations=10)
    assert has_meaningful_memory_change(base, head, threshold_pct=1.0) is False  # 1.94μs -> 1.80μs (7.82% faster)


def test_alloc_change_exceeds_threshold_even_if_mem_within_threshold():
    # If memory change is small but allocations change exceeds threshold, function should return True.
    base = MemoryStats(peak_memory_bytes=1000, total_allocations=100)
    # small memory change (1000 -> 1005 => 0.5%) but allocations jump 100 -> 1000 => 900% -> True
    head = MemoryStats(peak_memory_bytes=1005, total_allocations=1000)
    assert has_meaningful_memory_change(base, head, threshold_pct=1.0) is True  # 1.84μs -> 1.72μs (6.96% faster)


def test_both_changes_below_threshold_are_not_meaningful():
    # Both memory and allocation changes are below a strict threshold -> expect False
    base = MemoryStats(peak_memory_bytes=1000, total_allocations=1000)
    # small deltas: 1000 -> 1009 is 0.9% and allocations 1000 -> 1008 is 0.8%
    head = MemoryStats(peak_memory_bytes=1009, total_allocations=1008)
    assert has_meaningful_memory_change(base, head, threshold_pct=1.0) is False  # 1.75μs -> 1.43μs (22.4% faster)


def test_negative_values_are_handled_consistently():
    # Though negative memory values are unrealistic, function uses arithmetic and abs(), so it should work.
    base = MemoryStats(peak_memory_bytes=-100, total_allocations=-50)
    head = MemoryStats(peak_memory_bytes=-150, total_allocations=-75)
    # mem_pct = abs((-150 - -100) / -100) * 100 = abs(-50 / -100) * 100 = 50% -> > 1 -> True
    assert has_meaningful_memory_change(base, head, threshold_pct=1.0) is True  # 741ns -> 732ns (1.23% faster)


def test_threshold_parameter_changes_sensitivity():
    # Increasing the threshold can make previously meaningful changes non-meaningful.
    base = MemoryStats(peak_memory_bytes=100, total_allocations=10)
    head = MemoryStats(peak_memory_bytes=150, total_allocations=10)
    # 50% change; with threshold 10% -> True; with threshold 60% -> False
    assert has_meaningful_memory_change(base, head, threshold_pct=10.0) is True  # 1.59μs -> 1.34μs (18.7% faster)
    assert has_meaningful_memory_change(base, head, threshold_pct=60.0) is False  # 992ns -> 921ns (7.71% faster)


def test_large_scale_iterative_checks_count_expected_true_results():
    # Test with diverse, realistic memory comparison scenarios
    # covering both memory and allocation change branches
    test_cases = [
        (MemoryStats(1000, 100), MemoryStats(1500, 100), 50.0, True),
        (MemoryStats(1000, 100), MemoryStats(1010, 100), 50.0, False),
        (MemoryStats(5000000, 500), MemoryStats(5100000, 500), 1.0, True),
        (MemoryStats(2000, 200), MemoryStats(2000, 300), 10.0, True),
        (MemoryStats(512, 1000), MemoryStats(640, 1100), 25.0, True),
        (MemoryStats(1048576, 5000), MemoryStats(1048600, 5000), 0.1, False),
        (MemoryStats(10000, 50), MemoryStats(10050, 75), 1.0, True),
        (MemoryStats(999, 500), MemoryStats(1000, 510), 0.5, True),
        (MemoryStats(100000, 10000), MemoryStats(102000, 10100), 2.0, True),
        (MemoryStats(50000, 1000), MemoryStats(51000, 1010), 2.0, True),
        (MemoryStats(8388608, 2000), MemoryStats(8388616, 2001), 0.01, False),
        (MemoryStats(256, 100), MemoryStats(256, 500), 100.0, True),
    ]

    true_count = 0
    for base, head, threshold, expected in test_cases:
        result = has_meaningful_memory_change(base, head, threshold_pct=threshold)  # 8.22μs -> 7.66μs (7.35% faster)
        assert result is expected, f"Failed for {base} vs {head} with threshold {threshold}"
        if expected:
            true_count += 1

    assert true_count == sum(1 for _, _, _, expected in test_cases if expected)


def test_large_scale_allocation_based_changes_varying_thresholds():
    # Create realistic memory profiling scenarios with varying thresholds
    # to verify allocation-based branch detection works across diverse inputs
    test_scenarios = [
        (MemoryStats(1000000, 100), MemoryStats(1000010, 500), 100.0, True),
        (MemoryStats(500000, 1000), MemoryStats(500100, 2500), 50.0, True),
        (MemoryStats(2000000, 5000), MemoryStats(2000500, 5500), 75.0, False),
        (MemoryStats(8192, 200), MemoryStats(8200, 1000), 200.0, True),
        (MemoryStats(4096000, 2000), MemoryStats(4098000, 3000), 25.0, True),
        (MemoryStats(16777216, 10000), MemoryStats(16780000, 50000), 150.0, True),
        (MemoryStats(100000, 500), MemoryStats(100500, 1000), 80.0, True),
        (MemoryStats(1024000, 800), MemoryStats(1024100, 900), 10.0, False),
        (MemoryStats(2097152, 3000), MemoryStats(2100000, 10000), 200.0, True),
        (MemoryStats(65536, 250), MemoryStats(65600, 1250), 300.0, True),
    ]

    count_meaningful = 0
    for base, head, threshold, expected in test_scenarios:
        result = has_meaningful_memory_change(base, head, threshold_pct=threshold)  # 7.08μs -> 6.67μs (6.16% faster)
        assert result is expected, f"Failed for base={base}, head={head}, threshold={threshold}"
        if expected:
            count_meaningful += 1

    assert count_meaningful == sum(1 for _, _, _, expected in test_scenarios if expected)
# imports
from codeflash.benchmarking.compare import has_meaningful_memory_change
from codeflash.benchmarking.plugin.plugin import MemoryStats


def test_both_none_returns_false():
    """When both base_mem and head_mem are None, should return False."""
    result = has_meaningful_memory_change(None, None)  # 541ns -> 521ns (3.84% faster)
    assert result is False


def test_base_none_head_not_none_returns_true():
    """When base_mem is None and head_mem is not None, should return True."""
    head_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=10)
    result = has_meaningful_memory_change(None, head_mem)  # 471ns -> 521ns (9.60% slower)
    assert result is True


def test_base_not_none_head_none_returns_true():
    """When base_mem is not None and head_mem is None, should return True."""
    base_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=10)
    result = has_meaningful_memory_change(base_mem, None)  # 491ns -> 491ns (0.000% faster)
    assert result is True


def test_both_zero_memory_returns_false():
    """When both have zero peak memory and zero allocations, should return False."""
    base_mem = MemoryStats(peak_memory_bytes=0, total_allocations=0)
    head_mem = MemoryStats(peak_memory_bytes=0, total_allocations=0)
    result = has_meaningful_memory_change(base_mem, head_mem)  # 531ns -> 591ns (10.2% slower)
    assert result is False


def test_identical_stats_returns_false():
    """When both stats are identical, should return False (0% change)."""
    base_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=10)
    head_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=10)
    result = has_meaningful_memory_change(base_mem, head_mem)  # 1.71μs -> 1.59μs (7.53% faster)
    assert result is False


def test_memory_increase_above_threshold():
    """When peak memory increases by more than threshold_pct, should return True."""
    base_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=10)
    head_mem = MemoryStats(peak_memory_bytes=1020, total_allocations=10)  # 2% increase
    result = has_meaningful_memory_change(base_mem, head_mem, threshold_pct=1.0)  # 1.65μs -> 1.45μs (13.8% faster)
    assert result is True


def test_memory_decrease_above_threshold():
    """When peak memory decreases by more than threshold_pct, should return True."""
    base_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=10)
    head_mem = MemoryStats(peak_memory_bytes=980, total_allocations=10)  # 2% decrease
    result = has_meaningful_memory_change(base_mem, head_mem, threshold_pct=1.0)  # 1.59μs -> 1.39μs (14.4% faster)
    assert result is True


def test_memory_change_below_threshold_returns_false():
    """When memory change is below threshold_pct, should return False."""
    base_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=10)
    head_mem = MemoryStats(peak_memory_bytes=1005, total_allocations=10)  # 0.5% increase
    result = has_meaningful_memory_change(base_mem, head_mem, threshold_pct=1.0)  # 1.66μs -> 1.50μs (10.6% faster)
    assert result is False


def test_allocation_increase_above_threshold():
    """When total allocations increase by more than threshold_pct, should return True."""
    base_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=100)
    head_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=102)  # 2% increase
    result = has_meaningful_memory_change(base_mem, head_mem, threshold_pct=1.0)  # 1.76μs -> 1.52μs (15.8% faster)
    assert result is True


def test_allocation_decrease_above_threshold():
    """When total allocations decrease by more than threshold_pct, should return True."""
    base_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=100)
    head_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=98)  # 2% decrease
    result = has_meaningful_memory_change(base_mem, head_mem, threshold_pct=1.0)  # 1.70μs -> 1.49μs (14.1% faster)
    assert result is True


def test_allocation_change_below_threshold_returns_false():
    """When allocation change is below threshold_pct, should return False."""
    base_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=100)
    head_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=100)  # 0% change
    result = has_meaningful_memory_change(base_mem, head_mem, threshold_pct=1.0)  # 1.72μs -> 1.48μs (16.2% faster)
    assert result is False


def test_custom_threshold_1_percent():
    """With custom threshold of 1%, should detect 1.5% change but not 0.5%."""
    base_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=10)
    head_mem_above = MemoryStats(peak_memory_bytes=1015, total_allocations=10)  # 1.5%
    head_mem_below = MemoryStats(peak_memory_bytes=1005, total_allocations=10)  # 0.5%

    assert (
        has_meaningful_memory_change(base_mem, head_mem_above, threshold_pct=1.0) is True
    )  # 1.47μs -> 1.14μs (28.9% faster)
    assert (
        has_meaningful_memory_change(base_mem, head_mem_below, threshold_pct=1.0) is False
    )  # 972ns -> 862ns (12.8% faster)


def test_custom_threshold_5_percent():
    """With custom threshold of 5%, should detect 6% change but not 4%."""
    base_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=10)
    head_mem_above = MemoryStats(peak_memory_bytes=1060, total_allocations=10)  # 6%
    head_mem_below = MemoryStats(peak_memory_bytes=1040, total_allocations=10)  # 4%

    assert (
        has_meaningful_memory_change(base_mem, head_mem_above, threshold_pct=5.0) is True
    )  # 1.50μs -> 1.22μs (22.8% faster)
    assert (
        has_meaningful_memory_change(base_mem, head_mem_below, threshold_pct=5.0) is False
    )  # 852ns -> 841ns (1.31% faster)


def test_both_metrics_change_above_threshold():
    """When both memory and allocations change above threshold, should return True."""
    base_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=100)
    head_mem = MemoryStats(peak_memory_bytes=1020, total_allocations=102)  # both 2% increase
    result = has_meaningful_memory_change(base_mem, head_mem, threshold_pct=1.0)  # 1.46μs -> 1.19μs (22.7% faster)
    assert result is True


def test_base_peak_memory_zero_head_not_zero():
    """When base peak memory is zero, memory change cannot be calculated; check allocations only."""
    base_mem = MemoryStats(peak_memory_bytes=0, total_allocations=100)
    head_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=102)  # 2% allocation increase
    result = has_meaningful_memory_change(base_mem, head_mem, threshold_pct=1.0)  # 1.49μs -> 1.39μs (7.18% faster)
    assert result is True


def test_base_peak_memory_zero_allocations_same():
    """When base peak memory is zero and allocations don't change significantly, should return False."""
    base_mem = MemoryStats(peak_memory_bytes=0, total_allocations=100)
    head_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=101)  # 1% allocation increase
    result = has_meaningful_memory_change(base_mem, head_mem, threshold_pct=1.0)  # 1.53μs -> 1.37μs (11.7% faster)
    assert result is False


def test_base_allocations_zero_memory_changes():
    """When base allocations are zero, allocation change cannot be calculated; check memory only."""
    base_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=0)
    head_mem = MemoryStats(peak_memory_bytes=1020, total_allocations=100)  # 2% memory increase
    result = has_meaningful_memory_change(base_mem, head_mem, threshold_pct=1.0)  # 1.48μs -> 1.27μs (16.5% faster)
    assert result is True


def test_base_allocations_zero_memory_same():
    """When base allocations are zero and memory doesn't change significantly, should return False."""
    base_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=0)
    head_mem = MemoryStats(peak_memory_bytes=1005, total_allocations=100)  # 0.5% memory increase
    result = has_meaningful_memory_change(base_mem, head_mem, threshold_pct=1.0)  # 1.48μs -> 1.28μs (15.7% faster)
    assert result is False


def test_very_small_base_memory():
    """With very small base memory (1 byte), large percentage change should be detected."""
    base_mem = MemoryStats(peak_memory_bytes=1, total_allocations=10)
    head_mem = MemoryStats(peak_memory_bytes=2, total_allocations=10)  # 100% increase
    result = has_meaningful_memory_change(base_mem, head_mem, threshold_pct=1.0)  # 1.55μs -> 1.14μs (36.0% faster)
    assert result is True


def test_large_base_memory():
    """With large base memory, percentage change calculation should still work correctly."""
    base_mem = MemoryStats(peak_memory_bytes=1_000_000_000, total_allocations=10)
    head_mem = MemoryStats(peak_memory_bytes=1_020_000_000, total_allocations=10)  # 2% increase
    result = has_meaningful_memory_change(base_mem, head_mem, threshold_pct=1.0)  # 1.55μs -> 1.28μs (21.1% faster)
    assert result is True


def test_threshold_exactly_at_boundary():
    """When change is exactly at threshold boundary (e.g., 1.0%), should return False (not > threshold)."""
    base_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=10)
    head_mem = MemoryStats(peak_memory_bytes=1010, total_allocations=10)  # exactly 1% increase
    result = has_meaningful_memory_change(base_mem, head_mem, threshold_pct=1.0)  # 1.75μs -> 1.46μs (19.8% faster)
    assert result is False


def test_threshold_just_above_boundary():
    """When change is just above threshold boundary (e.g., 1.01%), should return True."""
    base_mem = MemoryStats(peak_memory_bytes=10000, total_allocations=10)
    head_mem = MemoryStats(peak_memory_bytes=10101, total_allocations=10)  # 1.01% increase
    result = has_meaningful_memory_change(base_mem, head_mem, threshold_pct=1.0)  # 1.47μs -> 1.28μs (14.8% faster)
    assert result is True


def test_threshold_zero():
    """With threshold_pct=0, any non-zero change should be detected."""
    base_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=10)
    head_mem = MemoryStats(peak_memory_bytes=1001, total_allocations=10)  # 0.1% increase
    result = has_meaningful_memory_change(base_mem, head_mem, threshold_pct=0.0)  # 1.43μs -> 1.18μs (21.0% faster)
    assert result is True


def test_threshold_zero_with_no_change():
    """With threshold_pct=0 and no change, should return False."""
    base_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=10)
    head_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=10)
    result = has_meaningful_memory_change(base_mem, head_mem, threshold_pct=0.0)  # 1.75μs -> 1.49μs (17.5% faster)
    assert result is False


def test_negative_memory_change():
    """Negative change in memory should be handled with absolute value."""
    base_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=10)
    head_mem = MemoryStats(peak_memory_bytes=800, total_allocations=10)  # 20% decrease
    result = has_meaningful_memory_change(base_mem, head_mem, threshold_pct=1.0)  # 1.53μs -> 1.27μs (20.4% faster)
    assert result is True


def test_negative_allocation_change():
    """Negative change in allocations should be handled with absolute value."""
    base_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=100)
    head_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=80)  # 20% decrease
    result = has_meaningful_memory_change(base_mem, head_mem, threshold_pct=1.0)  # 1.85μs -> 1.59μs (16.4% faster)
    assert result is True


def test_threshold_very_large():
    """With very large threshold_pct, no reasonable change should trigger True."""
    base_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=10)
    head_mem = MemoryStats(peak_memory_bytes=2000, total_allocations=10)  # 100% increase
    result = has_meaningful_memory_change(base_mem, head_mem, threshold_pct=101.0)  # 1.82μs -> 1.51μs (20.6% faster)
    assert result is False


def test_threshold_very_small():
    """With very small threshold_pct, even tiny changes should trigger True."""
    base_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=10)
    head_mem = MemoryStats(peak_memory_bytes=1001, total_allocations=10)  # 0.1% increase
    result = has_meaningful_memory_change(base_mem, head_mem, threshold_pct=0.01)  # 1.48μs -> 1.22μs (21.4% faster)
    assert result is True


def test_head_peak_memory_zero_base_not_zero():
    """When head peak memory is zero and base is not, it's a large decrease."""
    base_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=10)
    head_mem = MemoryStats(peak_memory_bytes=0, total_allocations=10)  # 100% decrease
    result = has_meaningful_memory_change(base_mem, head_mem, threshold_pct=1.0)  # 1.52μs -> 1.41μs (7.86% faster)
    assert result is True


def test_head_allocations_zero_base_not_zero():
    """When head allocations are zero and base is not, it's a large decrease."""
    base_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=100)
    head_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=0)  # 100% decrease
    result = has_meaningful_memory_change(base_mem, head_mem, threshold_pct=1.0)  # 1.82μs -> 1.57μs (16.0% faster)
    assert result is True


def test_very_large_memory_values():
    """Test with extremely large memory values (terabytes range)."""
    base_mem = MemoryStats(peak_memory_bytes=1_000_000_000_000, total_allocations=10)
    head_mem = MemoryStats(peak_memory_bytes=1_020_000_000_000, total_allocations=10)  # 2% increase
    result = has_meaningful_memory_change(base_mem, head_mem, threshold_pct=1.0)  # 1.77μs -> 1.98μs (10.6% slower)
    assert result is True


def test_very_large_allocation_values():
    """Test with extremely large allocation counts."""
    base_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=1_000_000_000)
    head_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=1_020_000_000)  # 2% increase
    result = has_meaningful_memory_change(base_mem, head_mem, threshold_pct=1.0)  # 1.82μs -> 1.63μs (11.7% faster)
    assert result is True


def test_repeated_calls_with_same_input():
    """Multiple calls with identical input should always return same result."""
    base_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=10)
    head_mem = MemoryStats(peak_memory_bytes=1010, total_allocations=10)

    result = has_meaningful_memory_change(base_mem, head_mem, threshold_pct=1.0)  # 1.68μs -> 1.53μs (9.78% faster)
    assert result is True


def test_repeated_calls_with_no_change():
    """Multiple calls with no change should always return False."""
    base_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=10)
    head_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=10)

    result = has_meaningful_memory_change(base_mem, head_mem)  # 1.62μs -> 1.30μs (24.6% faster)
    assert result is False


def test_multiple_thresholds_with_same_data():
    """Test the same data with multiple different thresholds."""
    base_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=100)
    head_mem = MemoryStats(peak_memory_bytes=1050, total_allocations=100)  # 5% increase

    thresholds = [0.1, 0.5, 1.0, 2.0, 4.0, 4.9, 5.0, 5.1, 10.0]
    results = [has_meaningful_memory_change(base_mem, head_mem, threshold_pct=t) for t in thresholds]

    # First 5 should be True (5% > 0.1%, 0.5%, 1%, 2%, 4%)
    # Last 4 should be False (5% <= 4.9%, 5%, 5.1%, 10%)
    assert results[:5] == [True, True, True, True, True]
    assert results[5:] == [False, False, False, False]


def test_boundary_case_with_many_iterations():
    """Test boundary conditions with varied base memory values."""
    test_bases = [100, 1000, 10000, 100000, 1000000]

    for base_memory in test_bases:
        base_mem = MemoryStats(peak_memory_bytes=base_memory, total_allocations=10)
        head_mem = MemoryStats(peak_memory_bytes=int(base_memory * 1.01), total_allocations=10)
        result = has_meaningful_memory_change(base_mem, head_mem, threshold_pct=1.0)  # 4.39μs -> 3.90μs (12.5% faster)
        assert result is False


def test_range_of_allocation_increases():
    """Test a range of allocation increase percentages to verify threshold logic."""
    base_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=1000)

    # Test allocations from 0% to 5% increase
    for increase_pct in range(6):
        head_allocations = int(1000 * (1.0 + increase_pct / 100.0))
        head_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=head_allocations)
        result = has_meaningful_memory_change(base_mem, head_mem, threshold_pct=2.0)  # 4.67μs -> 4.18μs (11.7% faster)

        # Should return True only if increase > 2%
        if increase_pct > 2:
            assert result is True, f"Failed for {increase_pct}% increase"
        else:
            assert result is False, f"Failed for {increase_pct}% increase"


def test_range_of_memory_increases():
    """Test a range of memory increase percentages to verify threshold logic."""
    base_mem = MemoryStats(peak_memory_bytes=10000, total_allocations=10)

    # Test memory from 0% to 5% increase
    for increase_pct in range(6):
        head_memory = int(10000 * (1.0 + increase_pct / 100.0))
        head_mem = MemoryStats(peak_memory_bytes=head_memory, total_allocations=10)
        result = has_meaningful_memory_change(base_mem, head_mem, threshold_pct=2.0)  # 4.41μs -> 3.81μs (15.8% faster)

        # Should return True only if increase > 2%
        if increase_pct > 2:
            assert result is True, f"Failed for {increase_pct}% increase"
        else:
            assert result is False, f"Failed for {increase_pct}% increase"


def test_stress_test_none_inputs():
    """Stress test with varied None inputs and memory configurations."""
    base_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=10)
    head_mem_small = MemoryStats(peak_memory_bytes=500, total_allocations=5)
    head_mem_large = MemoryStats(peak_memory_bytes=2000, total_allocations=20)

    assert has_meaningful_memory_change(None, base_mem) is True  # 512ns -> 521ns (1.73% slower)
    assert has_meaningful_memory_change(base_mem, None) is True  # 310ns -> 310ns (0.000% faster)
    assert has_meaningful_memory_change(None, None) is False  # 231ns -> 230ns (0.435% faster)
    assert (
        has_meaningful_memory_change(base_mem, head_mem_small, threshold_pct=1.0) is True
    )  # 1.32μs -> 1.10μs (20.1% faster)
    assert (
        has_meaningful_memory_change(base_mem, head_mem_large, threshold_pct=1.0) is True
    )  # 501ns -> 521ns (3.84% slower)
    assert (
        has_meaningful_memory_change(head_mem_small, head_mem_large, threshold_pct=1.0) is True
    )  # 421ns -> 370ns (13.8% faster)


def test_both_zero_repeated():
    """Calls with both stats having zero values."""
    base_mem = MemoryStats(peak_memory_bytes=0, total_allocations=0)
    head_mem = MemoryStats(peak_memory_bytes=0, total_allocations=0)

    result = has_meaningful_memory_change(base_mem, head_mem)  # 521ns -> 582ns (10.5% slower)
    assert result is False


def test_alternating_increase_decrease():
    """Test patterns of both increase and decrease with different magnitudes."""
    base_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=100)

    # 2% increase case (above 1% threshold)
    head_mem_increase = MemoryStats(peak_memory_bytes=1020, total_allocations=102)
    result_increase = has_meaningful_memory_change(
        base_mem, head_mem_increase, threshold_pct=1.0
    )  # 1.58μs -> 1.41μs (12.0% faster)
    assert result_increase is True

    # 2% decrease case (above 1% threshold)
    head_mem_decrease = MemoryStats(peak_memory_bytes=980, total_allocations=98)
    result_decrease = has_meaningful_memory_change(
        base_mem, head_mem_decrease, threshold_pct=1.0
    )  # 712ns -> 722ns (1.39% slower)
    assert result_decrease is True

    # 0.5% increase case (below 1% threshold)
    head_mem_small = MemoryStats(peak_memory_bytes=1005, total_allocations=100)
    result_small = has_meaningful_memory_change(
        base_mem, head_mem_small, threshold_pct=1.0
    )  # 802ns -> 711ns (12.8% faster)
    assert result_small is False

To test or edit this optimization locally git merge codeflash/optimize-pr1941-2026-04-02T17.07.34

Click to see suggested changes
Suggested change
if base_mem.peak_memory_bytes == 0 and head_mem.peak_memory_bytes == 0:
return False
if base_mem.peak_memory_bytes > 0:
mem_pct = abs((head_mem.peak_memory_bytes - base_mem.peak_memory_bytes) / base_mem.peak_memory_bytes) * 100
if mem_pct > threshold_pct:
return True
if base_mem.total_allocations > 0:
alloc_pct = abs((head_mem.total_allocations - base_mem.total_allocations) / base_mem.total_allocations) * 100
if alloc_pct > threshold_pct:
b_peak = base_mem.peak_memory_bytes
h_peak = head_mem.peak_memory_bytes
if b_peak == 0 and h_peak == 0:
return False
# When base peak is positive, check relative change without creating intermediate floats
if b_peak > 0:
# mem_pct > threshold_pct <=> abs(h_peak - b_peak) * 100 > threshold_pct * b_peak
if abs(h_peak - b_peak) * 100.0 > threshold_pct * b_peak:
return True
b_alloc = base_mem.total_allocations
if b_alloc > 0:
# alloc_pct > threshold_pct <=> abs(h_alloc - b_alloc) * 100 > threshold_pct * b_alloc
if abs(head_mem.total_allocations - b_alloc) * 100.0 > threshold_pct * b_alloc:

Static Badge

The hot path shows `logger.debug` consuming 18.3% of original runtime despite appearing infrequently (141 hits), because formatting the f-string occurs unconditionally even when debug logging is disabled. Wrapping it with `logger.isEnabledFor(logging.DEBUG)` defers string construction until confirmed necessary, eliminating wasteful formatting. Replacing `lambda x: x[3]` with `operator.itemgetter(3)` in the sort key reduces per-comparison overhead from a Python function call to a C-level attribute access, and hoisting the division constant `1_000_000.0` outside the loop avoids repeated float literal construction. Line profiler confirms the sort line dropped from 568 µs to 197 µs (65% faster) and the debug call from 1102 µs to 124 µs (89% faster), yielding a 45% overall speedup with no correctness or metric trade-offs.
@codeflash-ai
Copy link
Copy Markdown
Contributor

codeflash-ai bot commented Apr 2, 2026

⚡️ Codeflash found optimizations for this PR

📄 45% (0.45x) speedup for validate_and_format_benchmark_table in codeflash/benchmarking/utils.py

⏱️ Runtime : 1.26 milliseconds 869 microseconds (best of 5 runs)

A dependent PR with the suggested changes has been created. Please review:

If you approve, it will be merged into this PR (branch cf-compare-copy-benchmarks).

Static Badge

github-actions bot and others added 2 commits April 2, 2026 18:53
…2026-04-02T18.50.56

⚡️ Speed up function `validate_and_format_benchmark_table` by 45% in PR #1941 (`cf-compare-copy-benchmarks`)
@codeflash-ai
Copy link
Copy Markdown
Contributor

codeflash-ai bot commented Apr 3, 2026

This PR is now faster! 🚀 @claude[bot] accepted my optimizations from:

@KRRT7 KRRT7 changed the title feat: add --memory flag and memory-only benchmarks to codeflash compare feat: enhance codeflash compare with memory profiling, script mode, and auto-calibration Apr 3, 2026
@KRRT7 KRRT7 merged commit accb245 into main Apr 3, 2026
27 of 29 checks passed
@KRRT7 KRRT7 deleted the cf-compare-copy-benchmarks branch April 3, 2026 12:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant