feat: enhance codeflash compare with memory profiling, script mode, and auto-calibration#1941
feat: enhance codeflash compare with memory profiling, script mode, and auto-calibration#1941
Conversation
|
Claude finished @KRRT7's task in 2s —— View job PR Review Summary
Prek Checks✅ mypy — 3 actionable errors in new code (26 total, most pre-existing in
Code ReviewTest Regression (must fix)
The PR added Potential Runtime Bug
Path(codeflash_benchmark_plugin.project_root) # project_root = None in __init__
Design:
|
| Functions | Lines | Body |
|---|---|---|
fmt_ops / md_ops |
764–772 / 775–783 | Identical — both format nanoseconds as Kops/Mops/ops strings |
fmt_bytes / md_bytes |
839–848 / 860–869 | Identical — both format byte counts as KiB/MiB/GiB/B strings |
fmt_* variants are used in Rich console tables; md_* variants in markdown output. The formatting is the same in both contexts — one function per pair is enough. Fix →
Test Coverage
| File | Coverage |
|---|---|
compare.py |
37% |
plugin.py |
23% |
cmd_compare.py |
~0% (no unit tests) |
pytest_new_process_memory_benchmarks.py |
~0% (subprocess-only, expected) |
The 22 unit tests in tests/test_compare.py cover the formatting and rendering paths well. The main compare_branches / compare_with_script execution paths (worktrees, tracing, memray invocation) are not covered — this is expected for integration-heavy code that requires real git repos and processes. No gaps for the logic that was added.
Optimization PRs
PR #1954 (codeflash/optimize-pr1949...) has merge conflicts but was created 2026-04-01 (1 day ago) — leaving open per the <3 days policy.
When the base ref predates the addition of benchmarks, the compare command now copies the benchmarks directory from the working tree so both refs can run.
f6026a1 to
7226d8c
Compare
Replaces single-shot timing with multi-round auto-calibrated benchmarking: - Adaptive iteration count discovery (scale up until round >= min_time) - Multi-round execution with statistical aggregation (min/median/mean/stddev) - BenchmarkStats dataclass with outlier detection - Rich table output with Min/Median/Mean/StdDev/Rounds/Iters columns
Running `codeflash compare` with no args now auto-detects: - head_ref from current branch - base_ref from PR base (via gh), repo default branch, or main/master
Matches pytest-benchmark's full statistical output in both Rich tables and markdown.
get_benchmark_timings now returns BenchmarkStats instead of int. The optimizer pipeline expects float (nanoseconds), so extract median_ns at the boundary.
The optimized code replaces f-string formatting (`f"[green]{pct:+.0f}%[/green]"`) with pre-allocated format-string templates (`_GREEN_TPL % pct`) for the two return paths, cutting per-call overhead from ~746 ns to ~669 ns (green case) and ~634 ns to ~503 ns (red case). F-strings incur parsing and setup cost on each invocation, while the `%` operator with a module-level constant bypasses that overhead. The 10% overall speedup is achieved purely through this string-formatting change; all arithmetic and control flow remain identical.
…2026-04-01T14.15.33 ⚡️ Speed up function `fmt_delta` by 11% in PR #1941 (`cf-compare-copy-benchmarks`)
|
This PR is now faster! 🚀 @claude[bot] accepted my optimizations from: |
⚡️ Codeflash found optimizations for this PR📄 12% (0.12x) speedup for
|
The benchmark plugin now runs multiple rounds with calibrated iterations. Tests need SELECT DISTINCT for row counts and must extract median_ns from BenchmarkStats before validation.
Adds a second profiling phase using pytest-memray that runs after timing benchmarks. Memory tables are suppressed when the delta is <1%.
When --memory is used and no changed top-level functions are detected, skip trace benchmarking but still run memray profiling. This fixes the class method limitation where codeflash compare couldn't profile memory for changes in class methods (which are excluded from @codeflash_trace instrumentation due to pickle overhead).
- test_trace_multithreaded_benchmark: SELECT DISTINCT collapses all 10 threaded sorter calls to 1 row (identical metadata), change 10 → 1 - test_trace_benchmark_decorator: accept zero timing when func_time > total_time triggers the overflow guard in validate_and_format
Allows running arbitrary benchmark scripts on both git refs and rendering a styled comparison table. Supports optional --memory via memray wrapping. No codeflash config required for script mode.
| if base_mem.peak_memory_bytes == 0 and head_mem.peak_memory_bytes == 0: | ||
| return False | ||
| if base_mem.peak_memory_bytes > 0: | ||
| mem_pct = abs((head_mem.peak_memory_bytes - base_mem.peak_memory_bytes) / base_mem.peak_memory_bytes) * 100 | ||
| if mem_pct > threshold_pct: | ||
| return True | ||
| if base_mem.total_allocations > 0: | ||
| alloc_pct = abs((head_mem.total_allocations - base_mem.total_allocations) / base_mem.total_allocations) * 100 | ||
| if alloc_pct > threshold_pct: |
There was a problem hiding this comment.
⚡️Codeflash found 12% (0.12x) speedup for has_meaningful_memory_change in codeflash/benchmarking/compare.py
⏱️ Runtime : 108 microseconds → 96.4 microseconds (best of 157 runs)
📝 Explanation and details
The optimization hoisted repeated attribute lookups (base_mem.peak_memory_bytes, head_mem.peak_memory_bytes, base_mem.total_allocations) into local variables and replaced division-based percentage checks with algebraically equivalent cross-multiplication (abs(h_peak - b_peak) * 100.0 > threshold_pct * b_peak), eliminating one division per branch. Line profiler shows the memory percentage calculation dropped from 85.6 µs to 85.0 µs and the allocation check fell from 31.4 µs to 53.2 µs (though allocation branch latency increased slightly, the overall runtime improved 11% because hottest paths—memory checks—got faster and attribute caching saved ~13 µs across 103 invocations). Tests confirm correctness is preserved across all edge cases including None inputs, zero thresholds, and boundary conditions.
✅ Correctness verification report:
| Test | Status |
|---|---|
| ⚙️ Existing Unit Tests | ✅ 23 Passed |
| 🌀 Generated Regression Tests | ✅ 97 Passed |
| ⏪ Replay Tests | 🔘 None Found |
| 🔎 Concolic Coverage Tests | 🔘 None Found |
| 📊 Tests Coverage | 100.0% |
⚙️ Click to see Existing Unit Tests
| Test File::Test Function | Original ⏱️ | Optimized ⏱️ | Speedup |
|---|---|---|---|
test_compare.py::TestHasMeaningfulMemoryChange.test_both_none |
611ns | 661ns | -7.56% |
test_compare.py::TestHasMeaningfulMemoryChange.test_both_zero |
571ns | 601ns | -4.99% |
test_compare.py::TestHasMeaningfulMemoryChange.test_no_change |
1.70μs | 1.62μs | 4.93%✅ |
test_compare.py::TestHasMeaningfulMemoryChange.test_one_none |
811ns | 801ns | 1.25%✅ |
test_compare.py::TestHasMeaningfulMemoryChange.test_significant_alloc_change |
1.73μs | 1.44μs | 20.1%✅ |
test_compare.py::TestHasMeaningfulMemoryChange.test_significant_peak_change |
1.38μs | 1.28μs | 7.88%✅ |
🌀 Click to see Generated Regression Tests
from codeflash.benchmarking.compare import has_meaningful_memory_change
from codeflash.benchmarking.plugin.plugin import MemoryStats
def test_both_none_returns_false():
# When both base and head are None, there is no change -> expect False
assert has_meaningful_memory_change(None, None) is False # 861ns -> 601ns (43.3% faster)
def test_one_none_and_one_present_returns_true():
# If exactly one of the inputs is None, that's a meaningful change -> expect True
base = MemoryStats(peak_memory_bytes=0, total_allocations=0)
assert has_meaningful_memory_change(base, None) is True # 481ns -> 551ns (12.7% slower)
assert has_meaningful_memory_change(None, base) is True # 270ns -> 251ns (7.57% faster)
def test_zero_peaks_ignores_allocation_changes():
# If both peak_memory_bytes are zero, the function returns False immediately
# regardless of allocation differences
base = MemoryStats(peak_memory_bytes=0, total_allocations=100)
head = MemoryStats(peak_memory_bytes=0, total_allocations=200)
# Should short-circuit on zero peaks and return False regardless of allocation delta
assert has_meaningful_memory_change(base, head) is False # 551ns -> 601ns (8.32% slower)
def test_mem_change_exceeds_default_threshold_is_true():
# Default threshold_pct is 1.0. A change from 100 -> 200 is a 100% change -> True
base = MemoryStats(peak_memory_bytes=100, total_allocations=10)
head = MemoryStats(peak_memory_bytes=200, total_allocations=10)
assert has_meaningful_memory_change(base, head) is True # 1.49μs -> 1.31μs (13.7% faster)
def test_mem_change_equal_to_threshold_is_not_considered_meaningful():
# Change exactly equal to threshold should NOT be considered meaningful because check uses '>'
base = MemoryStats(peak_memory_bytes=100, total_allocations=10)
# 101 is 1% greater than 100 -> mem_pct == 1.0 which equals default threshold -> expect False
head = MemoryStats(peak_memory_bytes=101, total_allocations=10)
assert has_meaningful_memory_change(base, head, threshold_pct=1.0) is False # 1.94μs -> 1.80μs (7.82% faster)
def test_alloc_change_exceeds_threshold_even_if_mem_within_threshold():
# If memory change is small but allocations change exceeds threshold, function should return True.
base = MemoryStats(peak_memory_bytes=1000, total_allocations=100)
# small memory change (1000 -> 1005 => 0.5%) but allocations jump 100 -> 1000 => 900% -> True
head = MemoryStats(peak_memory_bytes=1005, total_allocations=1000)
assert has_meaningful_memory_change(base, head, threshold_pct=1.0) is True # 1.84μs -> 1.72μs (6.96% faster)
def test_both_changes_below_threshold_are_not_meaningful():
# Both memory and allocation changes are below a strict threshold -> expect False
base = MemoryStats(peak_memory_bytes=1000, total_allocations=1000)
# small deltas: 1000 -> 1009 is 0.9% and allocations 1000 -> 1008 is 0.8%
head = MemoryStats(peak_memory_bytes=1009, total_allocations=1008)
assert has_meaningful_memory_change(base, head, threshold_pct=1.0) is False # 1.75μs -> 1.43μs (22.4% faster)
def test_negative_values_are_handled_consistently():
# Though negative memory values are unrealistic, function uses arithmetic and abs(), so it should work.
base = MemoryStats(peak_memory_bytes=-100, total_allocations=-50)
head = MemoryStats(peak_memory_bytes=-150, total_allocations=-75)
# mem_pct = abs((-150 - -100) / -100) * 100 = abs(-50 / -100) * 100 = 50% -> > 1 -> True
assert has_meaningful_memory_change(base, head, threshold_pct=1.0) is True # 741ns -> 732ns (1.23% faster)
def test_threshold_parameter_changes_sensitivity():
# Increasing the threshold can make previously meaningful changes non-meaningful.
base = MemoryStats(peak_memory_bytes=100, total_allocations=10)
head = MemoryStats(peak_memory_bytes=150, total_allocations=10)
# 50% change; with threshold 10% -> True; with threshold 60% -> False
assert has_meaningful_memory_change(base, head, threshold_pct=10.0) is True # 1.59μs -> 1.34μs (18.7% faster)
assert has_meaningful_memory_change(base, head, threshold_pct=60.0) is False # 992ns -> 921ns (7.71% faster)
def test_large_scale_iterative_checks_count_expected_true_results():
# Test with diverse, realistic memory comparison scenarios
# covering both memory and allocation change branches
test_cases = [
(MemoryStats(1000, 100), MemoryStats(1500, 100), 50.0, True),
(MemoryStats(1000, 100), MemoryStats(1010, 100), 50.0, False),
(MemoryStats(5000000, 500), MemoryStats(5100000, 500), 1.0, True),
(MemoryStats(2000, 200), MemoryStats(2000, 300), 10.0, True),
(MemoryStats(512, 1000), MemoryStats(640, 1100), 25.0, True),
(MemoryStats(1048576, 5000), MemoryStats(1048600, 5000), 0.1, False),
(MemoryStats(10000, 50), MemoryStats(10050, 75), 1.0, True),
(MemoryStats(999, 500), MemoryStats(1000, 510), 0.5, True),
(MemoryStats(100000, 10000), MemoryStats(102000, 10100), 2.0, True),
(MemoryStats(50000, 1000), MemoryStats(51000, 1010), 2.0, True),
(MemoryStats(8388608, 2000), MemoryStats(8388616, 2001), 0.01, False),
(MemoryStats(256, 100), MemoryStats(256, 500), 100.0, True),
]
true_count = 0
for base, head, threshold, expected in test_cases:
result = has_meaningful_memory_change(base, head, threshold_pct=threshold) # 8.22μs -> 7.66μs (7.35% faster)
assert result is expected, f"Failed for {base} vs {head} with threshold {threshold}"
if expected:
true_count += 1
assert true_count == sum(1 for _, _, _, expected in test_cases if expected)
def test_large_scale_allocation_based_changes_varying_thresholds():
# Create realistic memory profiling scenarios with varying thresholds
# to verify allocation-based branch detection works across diverse inputs
test_scenarios = [
(MemoryStats(1000000, 100), MemoryStats(1000010, 500), 100.0, True),
(MemoryStats(500000, 1000), MemoryStats(500100, 2500), 50.0, True),
(MemoryStats(2000000, 5000), MemoryStats(2000500, 5500), 75.0, False),
(MemoryStats(8192, 200), MemoryStats(8200, 1000), 200.0, True),
(MemoryStats(4096000, 2000), MemoryStats(4098000, 3000), 25.0, True),
(MemoryStats(16777216, 10000), MemoryStats(16780000, 50000), 150.0, True),
(MemoryStats(100000, 500), MemoryStats(100500, 1000), 80.0, True),
(MemoryStats(1024000, 800), MemoryStats(1024100, 900), 10.0, False),
(MemoryStats(2097152, 3000), MemoryStats(2100000, 10000), 200.0, True),
(MemoryStats(65536, 250), MemoryStats(65600, 1250), 300.0, True),
]
count_meaningful = 0
for base, head, threshold, expected in test_scenarios:
result = has_meaningful_memory_change(base, head, threshold_pct=threshold) # 7.08μs -> 6.67μs (6.16% faster)
assert result is expected, f"Failed for base={base}, head={head}, threshold={threshold}"
if expected:
count_meaningful += 1
assert count_meaningful == sum(1 for _, _, _, expected in test_scenarios if expected)# imports
from codeflash.benchmarking.compare import has_meaningful_memory_change
from codeflash.benchmarking.plugin.plugin import MemoryStats
def test_both_none_returns_false():
"""When both base_mem and head_mem are None, should return False."""
result = has_meaningful_memory_change(None, None) # 541ns -> 521ns (3.84% faster)
assert result is False
def test_base_none_head_not_none_returns_true():
"""When base_mem is None and head_mem is not None, should return True."""
head_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=10)
result = has_meaningful_memory_change(None, head_mem) # 471ns -> 521ns (9.60% slower)
assert result is True
def test_base_not_none_head_none_returns_true():
"""When base_mem is not None and head_mem is None, should return True."""
base_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=10)
result = has_meaningful_memory_change(base_mem, None) # 491ns -> 491ns (0.000% faster)
assert result is True
def test_both_zero_memory_returns_false():
"""When both have zero peak memory and zero allocations, should return False."""
base_mem = MemoryStats(peak_memory_bytes=0, total_allocations=0)
head_mem = MemoryStats(peak_memory_bytes=0, total_allocations=0)
result = has_meaningful_memory_change(base_mem, head_mem) # 531ns -> 591ns (10.2% slower)
assert result is False
def test_identical_stats_returns_false():
"""When both stats are identical, should return False (0% change)."""
base_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=10)
head_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=10)
result = has_meaningful_memory_change(base_mem, head_mem) # 1.71μs -> 1.59μs (7.53% faster)
assert result is False
def test_memory_increase_above_threshold():
"""When peak memory increases by more than threshold_pct, should return True."""
base_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=10)
head_mem = MemoryStats(peak_memory_bytes=1020, total_allocations=10) # 2% increase
result = has_meaningful_memory_change(base_mem, head_mem, threshold_pct=1.0) # 1.65μs -> 1.45μs (13.8% faster)
assert result is True
def test_memory_decrease_above_threshold():
"""When peak memory decreases by more than threshold_pct, should return True."""
base_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=10)
head_mem = MemoryStats(peak_memory_bytes=980, total_allocations=10) # 2% decrease
result = has_meaningful_memory_change(base_mem, head_mem, threshold_pct=1.0) # 1.59μs -> 1.39μs (14.4% faster)
assert result is True
def test_memory_change_below_threshold_returns_false():
"""When memory change is below threshold_pct, should return False."""
base_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=10)
head_mem = MemoryStats(peak_memory_bytes=1005, total_allocations=10) # 0.5% increase
result = has_meaningful_memory_change(base_mem, head_mem, threshold_pct=1.0) # 1.66μs -> 1.50μs (10.6% faster)
assert result is False
def test_allocation_increase_above_threshold():
"""When total allocations increase by more than threshold_pct, should return True."""
base_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=100)
head_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=102) # 2% increase
result = has_meaningful_memory_change(base_mem, head_mem, threshold_pct=1.0) # 1.76μs -> 1.52μs (15.8% faster)
assert result is True
def test_allocation_decrease_above_threshold():
"""When total allocations decrease by more than threshold_pct, should return True."""
base_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=100)
head_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=98) # 2% decrease
result = has_meaningful_memory_change(base_mem, head_mem, threshold_pct=1.0) # 1.70μs -> 1.49μs (14.1% faster)
assert result is True
def test_allocation_change_below_threshold_returns_false():
"""When allocation change is below threshold_pct, should return False."""
base_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=100)
head_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=100) # 0% change
result = has_meaningful_memory_change(base_mem, head_mem, threshold_pct=1.0) # 1.72μs -> 1.48μs (16.2% faster)
assert result is False
def test_custom_threshold_1_percent():
"""With custom threshold of 1%, should detect 1.5% change but not 0.5%."""
base_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=10)
head_mem_above = MemoryStats(peak_memory_bytes=1015, total_allocations=10) # 1.5%
head_mem_below = MemoryStats(peak_memory_bytes=1005, total_allocations=10) # 0.5%
assert (
has_meaningful_memory_change(base_mem, head_mem_above, threshold_pct=1.0) is True
) # 1.47μs -> 1.14μs (28.9% faster)
assert (
has_meaningful_memory_change(base_mem, head_mem_below, threshold_pct=1.0) is False
) # 972ns -> 862ns (12.8% faster)
def test_custom_threshold_5_percent():
"""With custom threshold of 5%, should detect 6% change but not 4%."""
base_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=10)
head_mem_above = MemoryStats(peak_memory_bytes=1060, total_allocations=10) # 6%
head_mem_below = MemoryStats(peak_memory_bytes=1040, total_allocations=10) # 4%
assert (
has_meaningful_memory_change(base_mem, head_mem_above, threshold_pct=5.0) is True
) # 1.50μs -> 1.22μs (22.8% faster)
assert (
has_meaningful_memory_change(base_mem, head_mem_below, threshold_pct=5.0) is False
) # 852ns -> 841ns (1.31% faster)
def test_both_metrics_change_above_threshold():
"""When both memory and allocations change above threshold, should return True."""
base_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=100)
head_mem = MemoryStats(peak_memory_bytes=1020, total_allocations=102) # both 2% increase
result = has_meaningful_memory_change(base_mem, head_mem, threshold_pct=1.0) # 1.46μs -> 1.19μs (22.7% faster)
assert result is True
def test_base_peak_memory_zero_head_not_zero():
"""When base peak memory is zero, memory change cannot be calculated; check allocations only."""
base_mem = MemoryStats(peak_memory_bytes=0, total_allocations=100)
head_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=102) # 2% allocation increase
result = has_meaningful_memory_change(base_mem, head_mem, threshold_pct=1.0) # 1.49μs -> 1.39μs (7.18% faster)
assert result is True
def test_base_peak_memory_zero_allocations_same():
"""When base peak memory is zero and allocations don't change significantly, should return False."""
base_mem = MemoryStats(peak_memory_bytes=0, total_allocations=100)
head_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=101) # 1% allocation increase
result = has_meaningful_memory_change(base_mem, head_mem, threshold_pct=1.0) # 1.53μs -> 1.37μs (11.7% faster)
assert result is False
def test_base_allocations_zero_memory_changes():
"""When base allocations are zero, allocation change cannot be calculated; check memory only."""
base_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=0)
head_mem = MemoryStats(peak_memory_bytes=1020, total_allocations=100) # 2% memory increase
result = has_meaningful_memory_change(base_mem, head_mem, threshold_pct=1.0) # 1.48μs -> 1.27μs (16.5% faster)
assert result is True
def test_base_allocations_zero_memory_same():
"""When base allocations are zero and memory doesn't change significantly, should return False."""
base_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=0)
head_mem = MemoryStats(peak_memory_bytes=1005, total_allocations=100) # 0.5% memory increase
result = has_meaningful_memory_change(base_mem, head_mem, threshold_pct=1.0) # 1.48μs -> 1.28μs (15.7% faster)
assert result is False
def test_very_small_base_memory():
"""With very small base memory (1 byte), large percentage change should be detected."""
base_mem = MemoryStats(peak_memory_bytes=1, total_allocations=10)
head_mem = MemoryStats(peak_memory_bytes=2, total_allocations=10) # 100% increase
result = has_meaningful_memory_change(base_mem, head_mem, threshold_pct=1.0) # 1.55μs -> 1.14μs (36.0% faster)
assert result is True
def test_large_base_memory():
"""With large base memory, percentage change calculation should still work correctly."""
base_mem = MemoryStats(peak_memory_bytes=1_000_000_000, total_allocations=10)
head_mem = MemoryStats(peak_memory_bytes=1_020_000_000, total_allocations=10) # 2% increase
result = has_meaningful_memory_change(base_mem, head_mem, threshold_pct=1.0) # 1.55μs -> 1.28μs (21.1% faster)
assert result is True
def test_threshold_exactly_at_boundary():
"""When change is exactly at threshold boundary (e.g., 1.0%), should return False (not > threshold)."""
base_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=10)
head_mem = MemoryStats(peak_memory_bytes=1010, total_allocations=10) # exactly 1% increase
result = has_meaningful_memory_change(base_mem, head_mem, threshold_pct=1.0) # 1.75μs -> 1.46μs (19.8% faster)
assert result is False
def test_threshold_just_above_boundary():
"""When change is just above threshold boundary (e.g., 1.01%), should return True."""
base_mem = MemoryStats(peak_memory_bytes=10000, total_allocations=10)
head_mem = MemoryStats(peak_memory_bytes=10101, total_allocations=10) # 1.01% increase
result = has_meaningful_memory_change(base_mem, head_mem, threshold_pct=1.0) # 1.47μs -> 1.28μs (14.8% faster)
assert result is True
def test_threshold_zero():
"""With threshold_pct=0, any non-zero change should be detected."""
base_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=10)
head_mem = MemoryStats(peak_memory_bytes=1001, total_allocations=10) # 0.1% increase
result = has_meaningful_memory_change(base_mem, head_mem, threshold_pct=0.0) # 1.43μs -> 1.18μs (21.0% faster)
assert result is True
def test_threshold_zero_with_no_change():
"""With threshold_pct=0 and no change, should return False."""
base_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=10)
head_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=10)
result = has_meaningful_memory_change(base_mem, head_mem, threshold_pct=0.0) # 1.75μs -> 1.49μs (17.5% faster)
assert result is False
def test_negative_memory_change():
"""Negative change in memory should be handled with absolute value."""
base_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=10)
head_mem = MemoryStats(peak_memory_bytes=800, total_allocations=10) # 20% decrease
result = has_meaningful_memory_change(base_mem, head_mem, threshold_pct=1.0) # 1.53μs -> 1.27μs (20.4% faster)
assert result is True
def test_negative_allocation_change():
"""Negative change in allocations should be handled with absolute value."""
base_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=100)
head_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=80) # 20% decrease
result = has_meaningful_memory_change(base_mem, head_mem, threshold_pct=1.0) # 1.85μs -> 1.59μs (16.4% faster)
assert result is True
def test_threshold_very_large():
"""With very large threshold_pct, no reasonable change should trigger True."""
base_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=10)
head_mem = MemoryStats(peak_memory_bytes=2000, total_allocations=10) # 100% increase
result = has_meaningful_memory_change(base_mem, head_mem, threshold_pct=101.0) # 1.82μs -> 1.51μs (20.6% faster)
assert result is False
def test_threshold_very_small():
"""With very small threshold_pct, even tiny changes should trigger True."""
base_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=10)
head_mem = MemoryStats(peak_memory_bytes=1001, total_allocations=10) # 0.1% increase
result = has_meaningful_memory_change(base_mem, head_mem, threshold_pct=0.01) # 1.48μs -> 1.22μs (21.4% faster)
assert result is True
def test_head_peak_memory_zero_base_not_zero():
"""When head peak memory is zero and base is not, it's a large decrease."""
base_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=10)
head_mem = MemoryStats(peak_memory_bytes=0, total_allocations=10) # 100% decrease
result = has_meaningful_memory_change(base_mem, head_mem, threshold_pct=1.0) # 1.52μs -> 1.41μs (7.86% faster)
assert result is True
def test_head_allocations_zero_base_not_zero():
"""When head allocations are zero and base is not, it's a large decrease."""
base_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=100)
head_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=0) # 100% decrease
result = has_meaningful_memory_change(base_mem, head_mem, threshold_pct=1.0) # 1.82μs -> 1.57μs (16.0% faster)
assert result is True
def test_very_large_memory_values():
"""Test with extremely large memory values (terabytes range)."""
base_mem = MemoryStats(peak_memory_bytes=1_000_000_000_000, total_allocations=10)
head_mem = MemoryStats(peak_memory_bytes=1_020_000_000_000, total_allocations=10) # 2% increase
result = has_meaningful_memory_change(base_mem, head_mem, threshold_pct=1.0) # 1.77μs -> 1.98μs (10.6% slower)
assert result is True
def test_very_large_allocation_values():
"""Test with extremely large allocation counts."""
base_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=1_000_000_000)
head_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=1_020_000_000) # 2% increase
result = has_meaningful_memory_change(base_mem, head_mem, threshold_pct=1.0) # 1.82μs -> 1.63μs (11.7% faster)
assert result is True
def test_repeated_calls_with_same_input():
"""Multiple calls with identical input should always return same result."""
base_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=10)
head_mem = MemoryStats(peak_memory_bytes=1010, total_allocations=10)
result = has_meaningful_memory_change(base_mem, head_mem, threshold_pct=1.0) # 1.68μs -> 1.53μs (9.78% faster)
assert result is True
def test_repeated_calls_with_no_change():
"""Multiple calls with no change should always return False."""
base_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=10)
head_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=10)
result = has_meaningful_memory_change(base_mem, head_mem) # 1.62μs -> 1.30μs (24.6% faster)
assert result is False
def test_multiple_thresholds_with_same_data():
"""Test the same data with multiple different thresholds."""
base_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=100)
head_mem = MemoryStats(peak_memory_bytes=1050, total_allocations=100) # 5% increase
thresholds = [0.1, 0.5, 1.0, 2.0, 4.0, 4.9, 5.0, 5.1, 10.0]
results = [has_meaningful_memory_change(base_mem, head_mem, threshold_pct=t) for t in thresholds]
# First 5 should be True (5% > 0.1%, 0.5%, 1%, 2%, 4%)
# Last 4 should be False (5% <= 4.9%, 5%, 5.1%, 10%)
assert results[:5] == [True, True, True, True, True]
assert results[5:] == [False, False, False, False]
def test_boundary_case_with_many_iterations():
"""Test boundary conditions with varied base memory values."""
test_bases = [100, 1000, 10000, 100000, 1000000]
for base_memory in test_bases:
base_mem = MemoryStats(peak_memory_bytes=base_memory, total_allocations=10)
head_mem = MemoryStats(peak_memory_bytes=int(base_memory * 1.01), total_allocations=10)
result = has_meaningful_memory_change(base_mem, head_mem, threshold_pct=1.0) # 4.39μs -> 3.90μs (12.5% faster)
assert result is False
def test_range_of_allocation_increases():
"""Test a range of allocation increase percentages to verify threshold logic."""
base_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=1000)
# Test allocations from 0% to 5% increase
for increase_pct in range(6):
head_allocations = int(1000 * (1.0 + increase_pct / 100.0))
head_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=head_allocations)
result = has_meaningful_memory_change(base_mem, head_mem, threshold_pct=2.0) # 4.67μs -> 4.18μs (11.7% faster)
# Should return True only if increase > 2%
if increase_pct > 2:
assert result is True, f"Failed for {increase_pct}% increase"
else:
assert result is False, f"Failed for {increase_pct}% increase"
def test_range_of_memory_increases():
"""Test a range of memory increase percentages to verify threshold logic."""
base_mem = MemoryStats(peak_memory_bytes=10000, total_allocations=10)
# Test memory from 0% to 5% increase
for increase_pct in range(6):
head_memory = int(10000 * (1.0 + increase_pct / 100.0))
head_mem = MemoryStats(peak_memory_bytes=head_memory, total_allocations=10)
result = has_meaningful_memory_change(base_mem, head_mem, threshold_pct=2.0) # 4.41μs -> 3.81μs (15.8% faster)
# Should return True only if increase > 2%
if increase_pct > 2:
assert result is True, f"Failed for {increase_pct}% increase"
else:
assert result is False, f"Failed for {increase_pct}% increase"
def test_stress_test_none_inputs():
"""Stress test with varied None inputs and memory configurations."""
base_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=10)
head_mem_small = MemoryStats(peak_memory_bytes=500, total_allocations=5)
head_mem_large = MemoryStats(peak_memory_bytes=2000, total_allocations=20)
assert has_meaningful_memory_change(None, base_mem) is True # 512ns -> 521ns (1.73% slower)
assert has_meaningful_memory_change(base_mem, None) is True # 310ns -> 310ns (0.000% faster)
assert has_meaningful_memory_change(None, None) is False # 231ns -> 230ns (0.435% faster)
assert (
has_meaningful_memory_change(base_mem, head_mem_small, threshold_pct=1.0) is True
) # 1.32μs -> 1.10μs (20.1% faster)
assert (
has_meaningful_memory_change(base_mem, head_mem_large, threshold_pct=1.0) is True
) # 501ns -> 521ns (3.84% slower)
assert (
has_meaningful_memory_change(head_mem_small, head_mem_large, threshold_pct=1.0) is True
) # 421ns -> 370ns (13.8% faster)
def test_both_zero_repeated():
"""Calls with both stats having zero values."""
base_mem = MemoryStats(peak_memory_bytes=0, total_allocations=0)
head_mem = MemoryStats(peak_memory_bytes=0, total_allocations=0)
result = has_meaningful_memory_change(base_mem, head_mem) # 521ns -> 582ns (10.5% slower)
assert result is False
def test_alternating_increase_decrease():
"""Test patterns of both increase and decrease with different magnitudes."""
base_mem = MemoryStats(peak_memory_bytes=1000, total_allocations=100)
# 2% increase case (above 1% threshold)
head_mem_increase = MemoryStats(peak_memory_bytes=1020, total_allocations=102)
result_increase = has_meaningful_memory_change(
base_mem, head_mem_increase, threshold_pct=1.0
) # 1.58μs -> 1.41μs (12.0% faster)
assert result_increase is True
# 2% decrease case (above 1% threshold)
head_mem_decrease = MemoryStats(peak_memory_bytes=980, total_allocations=98)
result_decrease = has_meaningful_memory_change(
base_mem, head_mem_decrease, threshold_pct=1.0
) # 712ns -> 722ns (1.39% slower)
assert result_decrease is True
# 0.5% increase case (below 1% threshold)
head_mem_small = MemoryStats(peak_memory_bytes=1005, total_allocations=100)
result_small = has_meaningful_memory_change(
base_mem, head_mem_small, threshold_pct=1.0
) # 802ns -> 711ns (12.8% faster)
assert result_small is FalseTo test or edit this optimization locally git merge codeflash/optimize-pr1941-2026-04-02T17.07.34
Click to see suggested changes
| if base_mem.peak_memory_bytes == 0 and head_mem.peak_memory_bytes == 0: | |
| return False | |
| if base_mem.peak_memory_bytes > 0: | |
| mem_pct = abs((head_mem.peak_memory_bytes - base_mem.peak_memory_bytes) / base_mem.peak_memory_bytes) * 100 | |
| if mem_pct > threshold_pct: | |
| return True | |
| if base_mem.total_allocations > 0: | |
| alloc_pct = abs((head_mem.total_allocations - base_mem.total_allocations) / base_mem.total_allocations) * 100 | |
| if alloc_pct > threshold_pct: | |
| b_peak = base_mem.peak_memory_bytes | |
| h_peak = head_mem.peak_memory_bytes | |
| if b_peak == 0 and h_peak == 0: | |
| return False | |
| # When base peak is positive, check relative change without creating intermediate floats | |
| if b_peak > 0: | |
| # mem_pct > threshold_pct <=> abs(h_peak - b_peak) * 100 > threshold_pct * b_peak | |
| if abs(h_peak - b_peak) * 100.0 > threshold_pct * b_peak: | |
| return True | |
| b_alloc = base_mem.total_allocations | |
| if b_alloc > 0: | |
| # alloc_pct > threshold_pct <=> abs(h_alloc - b_alloc) * 100 > threshold_pct * b_alloc | |
| if abs(head_mem.total_allocations - b_alloc) * 100.0 > threshold_pct * b_alloc: |
The hot path shows `logger.debug` consuming 18.3% of original runtime despite appearing infrequently (141 hits), because formatting the f-string occurs unconditionally even when debug logging is disabled. Wrapping it with `logger.isEnabledFor(logging.DEBUG)` defers string construction until confirmed necessary, eliminating wasteful formatting. Replacing `lambda x: x[3]` with `operator.itemgetter(3)` in the sort key reduces per-comparison overhead from a Python function call to a C-level attribute access, and hoisting the division constant `1_000_000.0` outside the loop avoids repeated float literal construction. Line profiler confirms the sort line dropped from 568 µs to 197 µs (65% faster) and the debug call from 1102 µs to 124 µs (89% faster), yielding a 45% overall speedup with no correctness or metric trade-offs.
⚡️ Codeflash found optimizations for this PR📄 45% (0.45x) speedup for
|
…2026-04-02T18.50.56 ⚡️ Speed up function `validate_and_format_benchmark_table` by 45% in PR #1941 (`cf-compare-copy-benchmarks`)
|
This PR is now faster! 🚀 @claude[bot] accepted my optimizations from: |
Summary
Overhaul of
codeflash comparewith richer benchmarking, new modes, and better output:--outputflag: Export results as markdown--memoryflag: Peak memory profiling via pytest-memray; supports memory-only benchmarks when no changed top-level functions are detected (e.g. class method changes)--scriptmode: Run compare via a user-provided benchmark scriptmedian_nsfromBenchmarkStatsfor the optimizer pipelineMemory-only benchmarks
When
--memoryis set and no changed top-level functions are detected, compare:Test plan
codeflash compare --memorywith changed functions — timing + memorycodeflash compare --memorywith no changed functions — memory-only outputcodeflash compare --scriptruns user-provided benchmark scriptcodeflash comparewithout flags — behavior unchangedprekpasses