perf: expand profiling benchmarks around public API workflows#349
perf: expand profiling benchmarks around public API workflows#349acgetchell merged 8 commits intomainfrom
Conversation
- Run profiling comparisons with the checked-out crate toolchain by default - Add local `just profile` support for comparing code refs and compiler versions - Expand `ci_performance_suite` beyond construction to cover hulls, boundary traversal, validation, and bistellar flips - Emit a versioned API benchmark manifest so benchmark logs show which public workflows were measured
WalkthroughAdds a CI-focused public API benchmarking suite and manifest, reworks local profiling tooling ( Changes
Sequence Diagram(s)sequenceDiagram
participant Dev as Developer / justfile
participant CI as GitHub Actions
participant Git as Git (checkout / worktree)
participant Cargo as rustup / cargo
participant Script as capture_metadata.sh
participant FS as Filesystem / Artifacts
Dev->>CI: push / trigger profiling workflow
CI->>Git: checkout repo (optional worktree for code_ref)
CI->>Cargo: setup Rust toolchain (rust-toolchain.toml or requested)
CI->>Cargo: run `cargo bench --profile perf --bench ci_performance_suite`
Cargo->>FS: emit `target/criterion/**` and `profiling-results/` artifacts
CI->>Script: run `./scripts/ci/capture_profiling_metadata.sh` (BENCH_FILTER / PROFILE_METADATA_MODE)
Script->>FS: write `profiling-results/environment_metadata.md`
CI->>FS: upload `profiling-results/*` as workflow artifacts
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Complexity | 34 |
🟢 Coverage 100.00% diff coverage · +0.00% coverage variation
Metric Results Coverage variation ✅ +0.00% coverage variation (-1.00%) Diff coverage ✅ 100.00% diff coverage Coverage variation details
Coverable lines Covered lines Coverage Common ancestor commit (6e437fb) 50908 45309 89.00% Head commit (70703fb) 50908 (+0) 45309 (+0) 89.00% (+0.00%) Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch:
<coverage of head commit> - <coverage of common ancestor commit>Diff coverage details
Coverable lines Covered lines Diff coverage Pull request (#349) 4 4 100.00% Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified:
<covered lines added or modified>/<coverable lines added or modified> * 100%
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/profiling-benchmarks.yml:
- Around line 121-132: The TOML metadata extraction commands (the
declared_toolchain and rust_version command substitutions) are currently fatal
under set -euo pipefail when files or keys are missing; change them to be
best-effort by making the command substitutions non-fatal (e.g., ensure failures
in the pipelines do not cause the script to exit) so missing rust-toolchain.toml
or Cargo.toml simply produce empty/default variables; update both
declared_toolchain and rust_version extractions (and the analogous block around
lines 320-331) to swallow errors or provide a safe fallback so metadata
collection is optional rather than a hard failure.
In `@benches/ci_performance_suite.rs`:
- Around line 336-405: The benchmark is measuring candidate discovery plus flip
cost because roundtrip_k2_4d and roundtrip_k3_4d scan inside the timed closure;
move discovery into the setup closure by extracting functions that find a
flippable handle (e.g. find_flippable_k2_facet_4d and
find_flippable_k3_ridge_4d) using the existing collect_interior_facets_4d and
collect_ridges_4d iteration logic, then change roundtrip_k2_4d and
roundtrip_k3_4d to take the found handle as an argument and only perform
dt.flip_k2 / dt.flip_k3 and their inverse; update the benchmark to clone base_dt
in the setup, call the new finder there to return (dt, facet/ridge), and keep
the timed closure to call the adjusted roundtrip functions so only the flip
roundtrip is measured.
- Around line 210-222: prepare_triangulation currently only generates a single
well-conditioned seeded random triangulation (via prepare_benchmark_data and
BenchTriangulation::new_with_options), so CI benchmarks never exercise
adversarial geometry; update the benchmark data generation to include
adversarial datasets by either extending prepare_triangulation to accept a
mode/seed that constructs adversarial inputs (near-boundary points, cospherical
sets, degenerate simplices, large coordinates) or add a companion function
(e.g., prepare_adversarial_triangulation) that uses prepare_benchmark_data or a
custom generator to inject those edge cases, then update the benchmarks that
call prepare_triangulation to run both the well-conditioned and adversarial
variants (or parameterize over seeds and RetryPolicy::Shuffled base_seed) so
each benchmark exercises both robust and well-conditioned inputs using
BenchTriangulation, ConstructionOptions, and RetryPolicy.
In `@justfile`:
- Around line 440-452: The command-substitution that sets requested_toolchain
should be made tolerant to failures so set -euo pipefail doesn't abort early:
modify the subshell/pipeline that greps rust-toolchain.toml for channel to
suppress non-zero exit (e.g., append a fallback like "|| true" to the pipeline
or to the grep) so it returns an empty string instead of failing, then keep the
existing if [[ -z "$requested_toolchain" ]] check to show the user-facing error
and exit; target the assignment that populates the requested_toolchain variable
from rust-toolchain.toml/channel.
- Around line 472-490: The profile recipe writes metadata saying "Cargo profile:
cargo bench default" but invokes rustup run "$requested_toolchain" cargo bench
--bench ci_performance_suite without the perf profile; update the invocation of
cargo bench in the block that runs in "$workdir" (the rustup run ... cargo bench
--bench ci_performance_suite command) to include the CI perf profile (add
--profile perf) and also update the metadata written to
"$run_dir/profile_metadata.md" (the "Cargo profile:" line) to reflect "cargo
bench --profile perf" so local runs match CI baselines.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 15aa59a8-db11-48b8-be8d-e6af1a08046d
📒 Files selected for processing (3)
.github/workflows/profiling-benchmarks.ymlbenches/ci_performance_suite.rsjustfile
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #349 +/- ##
=======================================
Coverage 88.98% 88.98%
=======================================
Files 58 58
Lines 50713 50713
=======================================
Hits 45125 45125
Misses 5588 5588
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
- Run profiling benchmarks against the selected code and Rust toolchain versions. - Expand ci_performance_suite around public API workflows, adversarial inputs, incremental insertion, validation, convex hulls, and bistellar flip roundtrips. - Retire microbenchmarks in favor of CI and profiling suite coverage. - Update benchmark summary generation and performance results for the expanded CI suite, with circumsphere results grouped separately. - Add focused prelude exports for generator-heavy doctests, examples, tests, and benchmarks.
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
benches/profiling_suite.rs (1)
530-545:⚠️ Potential issue | 🟠 MajorKeep
actual_point_countsbookkeeping out ofmeasure(...).
actual_point_counts.push(points.len())runs inside the counted closure, so once thatSmallBuffergrows past its inline capacity it starts charging benchmark-harness allocations as triangulation allocations. WithBENCHMARK_ITERATION_BUFFER_SIZE = 8and the default Criterion sample size of 10, this skews the memory numbers in the common case.💡 Suggested fix
for _ in 0..iters { let start_time = Instant::now(); + let mut actual_count = 0usize; let alloc_info = measure(|| { let points = gen_points::<D>(count, PointDistribution::Random, DEFAULT_SEED); let vertices: Vec<_> = points.iter().map(|p| vertex!(*p)).collect(); - actual_point_counts.push(points.len()); // Track actual count + actual_count = points.len(); if let Ok(dt) = DelaunayTriangulationBuilder::new(&vertices).build::<()>() { black_box(dt); } }); total_time += start_time.elapsed(); allocation_infos.push(alloc_info); + actual_point_counts.push(actual_count); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@benches/profiling_suite.rs` around lines 530 - 545, The measured closure currently includes actual_point_counts.push(points.len()), which causes benchmark allocations to be attributed to the triangulation; to fix, compute and store the points length outside the measure closure (e.g., let points = gen_points::<D>(...); let pts_len = points.len(); then call measure with the triangulation work using points/vertices) and then push pts_len into actual_point_counts after measure returns; update the block around gen_points, vertex! mapping, DelaunayTriangulationBuilder::new(...).build::<()>() and allocation_infos handling so only the triangulation work is measured and bookkeeping (actual_point_counts.push) occurs outside the measured closure.
🧹 Nitpick comments (5)
benches/ci_performance_suite.rs (3)
412-432:roundtrip_k1_4dpicks an arbitrary cell each iteration.
dt.cells().next()returns the first cell from aSlotMapiterator; afterflip_k1_insertmutates the topology andflip_k1_removereverses it, subsequent iterations operate on a freshly clonedbase_dtagain, so the cell choice is stable across iterations. That's fine — but the choice depends onSlotMapiteration order, which is deterministic for a given map state but not guaranteed to be the centroid-friendly cell. If a future change to the seed/insertion path lands the "first cell" on a sliver, the centroid may fall outside or on a facet andflip_k1_insertwill start failing with a panic.A small hardening: pick the cell with the largest volume (or any deterministic non-degenerate choice) once outside the timed closure and pass its
CellKeyin via the setup closure.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@benches/ci_performance_suite.rs` around lines 412 - 432, roundtrip_k1_4d currently picks an arbitrary cell via dt.cells().next(), which can pick a degenerate/sliver cell under different map orders; instead, before the timed iterations compute a deterministic, non-degenerate CellKey (e.g. iterate all cells on the cloned base triangulation and pick the one with the largest volume using cell_centroid_4d/volume helper) and pass that CellKey into the benchmark closure so roundtrip_k1_4d uses that precomputed CellKey for flip_k1_insert and flip_k1_remove; modify the setup to produce the chosen CellKey from FlipTriangulation4 and update roundtrip_k1_4d signature to accept a CellKey (and use dt.tds().vertex_key_from_uuid/new_uuid logic unchanged).
309-338: Adversarial seed search has noDELAUNAY_BENCH_DISCOVER_SEEDSescape hatch.
prepare_datafalls back tofind_seed_verticesand usesseed_search_limit()(default 2000) so users can extend the search viaDELAUNAY_BENCH_DISCOVER_SEEDS_LIMIT.prepare_adv_datashares the limit but, unlike well-conditioned cases, has noKNOWN_SEEDStable to short-circuit. If the search ever fails for one of the new_adversarialcases, the only recourse will be patching the source — there's no equivalent to the documentedDELAUNAY_BENCH_DISCOVER_SEEDS=1workflow for adversarial inputs.Consider either (a) adding a
KNOWN_ADV_SEEDStable once a stable suite of seeds is established, or (b) extending the discover-seeds hook inbenchmark_tds_new_dimension!to also emit the adversarial seed whentds_new_<D>d/tds_new_adversarial/<count>is filtered.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@benches/ci_performance_suite.rs` around lines 309 - 338, prepare_adv_data currently brute-forces seeds with seed_search_limit() and will panic if none found, but lacks the documented discover-seeds escape hatch; update prepare_adv_data to (1) consult a new KNOWN_ADV_SEEDS table (map of dimension+count -> seed) before searching and return the stored seed/points/vertices if present, and (2) honor the existing DELAUNAY_BENCH_DISCOVER_SEEDS / DELAUNAY_BENCH_DISCOVER_SEEDS_LIMIT workflow by checking the env var at runtime: when DELAUNAY_BENCH_DISCOVER_SEEDS is set, continue the search as now but on a successful candidate from BenchTriangulation::new_with_options emit a machine-readable line (e.g. "ADV_SEED D count seed") to stdout or the logger and return instead of panicking so CI can record the adversarial seed; ensure you still use seed_search_limit() for the limit and reference prepare_adv_data, BenchTriangulation::new_with_options, seed_search_limit, and the new KNOWN_ADV_SEEDS lookup when making changes.
1185-1213: Hoist flip-candidate discovery out of the per-batch setup closure.
flippable_k2_facet_4d/flippable_k3_ridge_4dcurrently run once periter_batchedbatch, linearly scanning all interior facets/ridges and cloningdtfor each candidate to trial-flip. Sincebase_dtis immutable andFacetHandle/RidgeHandleare stable acrossclone()(SlotMap preserves all key slots and versions exactly), compute the handle once beforebench_functionand reuse it across batches—eliminating O(N_batches) redundant discovery scans.♻️ Hoist discovery to once per benchmark
+ let k2_facet = flippable_k2_facet_4d(&base_dt); group.bench_function("k2_roundtrip", |b| { b.iter_batched( - || { - let dt = base_dt.clone(); - let facet = flippable_k2_facet_4d(&dt); - (dt, facet) - }, + || (base_dt.clone(), k2_facet), |(mut dt, facet)| { roundtrip_k2_4d(&mut dt, facet); black_box(dt); }, BatchSize::LargeInput, ); }); + let k3_ridge = flippable_k3_ridge_4d(&base_dt); group.bench_function("k3_roundtrip", |b| { b.iter_batched( - || { - let dt = base_dt.clone(); - let ridge = flippable_k3_ridge_4d(&dt); - (dt, ridge) - }, + || (base_dt.clone(), k3_ridge), |(mut dt, ridge)| { roundtrip_k3_4d(&mut dt, ridge); black_box(dt); }, BatchSize::LargeInput, ); });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@benches/ci_performance_suite.rs` around lines 1185 - 1213, The benchmark currently calls flippable_k2_facet_4d and flippable_k3_ridge_4d inside the iter_batched setup closure, causing a full scan and dt.clone() per batch; hoist the discovery out so you compute the candidate handles once before each group.bench_function call (using base_dt to find the FacetHandle / RidgeHandle) and then pass those handles into iter_batched, leaving only the mutable dt and the trial flip in the per-batch closure; update the "k2_roundtrip" and "k3_roundtrip" benchmarks to use the precomputed handles instead of calling flippable_k2_facet_4d / flippable_k3_ridge_4d inside the closure so discovery runs once per benchmark..github/workflows/profiling-benchmarks.yml (1)
316-350: Same metadata block, duplicated.The
comprehensive-profilingandmemory-stress-testjobs now carry near-identical "Capture profiling environment metadata" steps. If you tweak the format or add a field (e.g.cargo bench --profile), you'll need to keep them in sync. Not blocking, but a small composite action or a shared shell script underscripts/ci/would let you change the format in one place.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/profiling-benchmarks.yml around lines 316 - 350, The metadata capture block duplicated between the comprehensive-profiling and memory-stress-test jobs should be centralized: extract the shell sequence currently in the "Capture memory profiling environment metadata" step into a single reusable script (e.g. scripts/ci/capture_profiling_metadata.sh) or a composite GitHub Action, make it executable, and then replace each job's inline step with a single run calling that script (or uses: ./path/to/composite). Update references in the comprehensive-profiling and memory-stress-test job steps so they invoke the new reusable unit and ensure it writes profiling-results/environment_metadata.md as before.scripts/benchmark_utils.py (1)
888-897: Usestartswithforbistellar_flipsprefix to be future-proof.The manifest currently limits bistellar_flips benchmarks to 4D only, but all other performance groups (
construction,boundary_facets,convex_hull,validation,incremental_insert) are dimension-agnostic. When the Rust suite addsbistellar_flips_3dorbistellar_flips_5d(aligning with the PR's stated intent to expand benchmark coverage), the current exact match== "bistellar_flips_4d"will fail to recognize them. The results will silently drop with no warning. Change tostartswith("bistellar_flips")to keep pace with the manifest without manual sync.Proposed change
if first_path_part.startswith("tds_new_"): return "construction" - if first_path_part == "bistellar_flips_4d": + if first_path_part.startswith("bistellar_flips"): return "bistellar_flips"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/benchmark_utils.py` around lines 888 - 897, The _ci_suite_group_key static method currently checks first_path_part == "bistellar_flips_4d", which will miss future variants like "bistellar_flips_3d"; change that equality to use first_path_part.startswith("bistellar_flips") so any dimensional suffix maps to the "bistellar_flips" group, keeping behavior consistent with CI_PERFORMANCE_SUITE_GROUPS and the other prefix-based checks.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@benches/ci_performance_suite.rs`:
- Around line 265-279: prepare_inserts currently derives the same seed for both
Dataset::WellConditioned and Dataset::Adversarial which makes adversarial
inserts share RNG state with well-conditioned ones; change the seed derivation
inside prepare_inserts to be dataset-aware (e.g., compute seed =
dim_seed.wrapping_add(0x5151_5151) and then for Dataset::Adversarial apply an
extra twist such as seed ^= 0xA5A5_A5A5) before calling generate_adv_points or
generate_random_points_seeded so that generate_adv_points::<D>(count, seed) and
generate_random_points_seeded::<f64, D>(..., seed) receive distinct,
reproducible seeds while keeping prepare_inserts, generate_adv_points, and
generate_random_points_seeded signatures unchanged.
- Around line 90-133: The hardcoded benchmark_ids in API_BENCHMARK_ENTRIES
(field benchmark_ids on ApiBenchmarkEntry entries like group "construction",
"boundary_facets", etc.) can drift from runtime constants (e.g. COUNTS,
COUNTS_5D and any insert/operation count constants); replace the brittle string
literals by building the benchmark id strings at startup from those constants
(or add a debug_assert in the module that computes expected strings from
COUNTS/COUNTS_5D and compares them against the hardcoded benchmark_ids) so the
manifest always reflects the actual runtime values; locate API_BENCHMARK_ENTRIES
and either convert it to a function that returns Vec<ApiBenchmarkEntry>
constructed from the constants or add a validation helper that runs at init to
assert equality.
In `@benches/profiling_suite.rs`:
- Around line 743-797: Add adversarial benchmark companions by invoking the
existing benchmark_validation_components_dimension! macro to create
`_adversarial` variants: call benchmark_validation_components_dimension!(2,
benchmark_validation_components_2d_adversarial, <count>); and likewise for
3D/4D/5D (use the same or slightly larger counts as needed). Inside those runs,
switch the point generator to produce adversarial/near-boundary inputs (e.g.,
use the adversarial distribution variant of gen_points, such as
PointDistribution::Adversarial or the project’s equivalent) so the macro will
build triangulations from adversarial vertices and register groups named like
"validation_components_2d_adversarial" (the macro already formats group names
from $dim, so keep the suffix by naming the function accordingly). Ensure each
adversarial invocation mirrors the existing benchmark names
(`benchmark_validation_components_<Nd>`) and expectations so the new benches
exercise degenerate/near-boundary cases alongside the current random ones.
In `@justfile`:
- Around line 425-438: The trap for cleanup must be installed before running git
worktree add so a failed worktree add doesn't leak tmp_parent; after creating
tmp_parent and setting workdir, set cleanup_worktree=1 and immediately register
trap cleanup EXIT (i.e., move the trap registration to just after
tmp_parent="$(mktemp -d ...)" / workdir="$tmp_parent/worktree" and before git
worktree add), and ensure the cleanup() function references workdir/tmp_parent
as it does now; keep the existing git worktree remove and rm -rf logic
unchanged.
---
Outside diff comments:
In `@benches/profiling_suite.rs`:
- Around line 530-545: The measured closure currently includes
actual_point_counts.push(points.len()), which causes benchmark allocations to be
attributed to the triangulation; to fix, compute and store the points length
outside the measure closure (e.g., let points = gen_points::<D>(...); let
pts_len = points.len(); then call measure with the triangulation work using
points/vertices) and then push pts_len into actual_point_counts after measure
returns; update the block around gen_points, vertex! mapping,
DelaunayTriangulationBuilder::new(...).build::<()>() and allocation_infos
handling so only the triangulation work is measured and bookkeeping
(actual_point_counts.push) occurs outside the measured closure.
---
Nitpick comments:
In @.github/workflows/profiling-benchmarks.yml:
- Around line 316-350: The metadata capture block duplicated between the
comprehensive-profiling and memory-stress-test jobs should be centralized:
extract the shell sequence currently in the "Capture memory profiling
environment metadata" step into a single reusable script (e.g.
scripts/ci/capture_profiling_metadata.sh) or a composite GitHub Action, make it
executable, and then replace each job's inline step with a single run calling
that script (or uses: ./path/to/composite). Update references in the
comprehensive-profiling and memory-stress-test job steps so they invoke the new
reusable unit and ensure it writes profiling-results/environment_metadata.md as
before.
In `@benches/ci_performance_suite.rs`:
- Around line 412-432: roundtrip_k1_4d currently picks an arbitrary cell via
dt.cells().next(), which can pick a degenerate/sliver cell under different map
orders; instead, before the timed iterations compute a deterministic,
non-degenerate CellKey (e.g. iterate all cells on the cloned base triangulation
and pick the one with the largest volume using cell_centroid_4d/volume helper)
and pass that CellKey into the benchmark closure so roundtrip_k1_4d uses that
precomputed CellKey for flip_k1_insert and flip_k1_remove; modify the setup to
produce the chosen CellKey from FlipTriangulation4 and update roundtrip_k1_4d
signature to accept a CellKey (and use dt.tds().vertex_key_from_uuid/new_uuid
logic unchanged).
- Around line 309-338: prepare_adv_data currently brute-forces seeds with
seed_search_limit() and will panic if none found, but lacks the documented
discover-seeds escape hatch; update prepare_adv_data to (1) consult a new
KNOWN_ADV_SEEDS table (map of dimension+count -> seed) before searching and
return the stored seed/points/vertices if present, and (2) honor the existing
DELAUNAY_BENCH_DISCOVER_SEEDS / DELAUNAY_BENCH_DISCOVER_SEEDS_LIMIT workflow by
checking the env var at runtime: when DELAUNAY_BENCH_DISCOVER_SEEDS is set,
continue the search as now but on a successful candidate from
BenchTriangulation::new_with_options emit a machine-readable line (e.g.
"ADV_SEED D count seed") to stdout or the logger and return instead of panicking
so CI can record the adversarial seed; ensure you still use seed_search_limit()
for the limit and reference prepare_adv_data,
BenchTriangulation::new_with_options, seed_search_limit, and the new
KNOWN_ADV_SEEDS lookup when making changes.
- Around line 1185-1213: The benchmark currently calls flippable_k2_facet_4d and
flippable_k3_ridge_4d inside the iter_batched setup closure, causing a full scan
and dt.clone() per batch; hoist the discovery out so you compute the candidate
handles once before each group.bench_function call (using base_dt to find the
FacetHandle / RidgeHandle) and then pass those handles into iter_batched,
leaving only the mutable dt and the trial flip in the per-batch closure; update
the "k2_roundtrip" and "k3_roundtrip" benchmarks to use the precomputed handles
instead of calling flippable_k2_facet_4d / flippable_k3_ridge_4d inside the
closure so discovery runs once per benchmark.
In `@scripts/benchmark_utils.py`:
- Around line 888-897: The _ci_suite_group_key static method currently checks
first_path_part == "bistellar_flips_4d", which will miss future variants like
"bistellar_flips_3d"; change that equality to use
first_path_part.startswith("bistellar_flips") so any dimensional suffix maps to
the "bistellar_flips" group, keeping behavior consistent with
CI_PERFORMANCE_SUITE_GROUPS and the other prefix-based checks.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 7bea92c5-4a15-4958-9105-8f3d754e113d
📒 Files selected for processing (16)
.github/workflows/profiling-benchmarks.ymlCargo.tomlbenches/PERFORMANCE_RESULTS.mdbenches/README.mdbenches/ci_performance_suite.rsbenches/microbenchmarks.rsbenches/profiling_suite.rsdocs/code_organization.mdexamples/convex_hull_3d_100_points.rsexamples/pachner_roundtrip_4d.rsjustfilescripts/README.mdscripts/benchmark_utils.pyscripts/tests/test_benchmark_utils.pysrc/lib.rstests/prelude_exports.rs
💤 Files with no reviewable changes (3)
- docs/code_organization.md
- Cargo.toml
- benches/microbenchmarks.rs
✅ Files skipped from review due to trivial changes (3)
- examples/pachner_roundtrip_4d.rs
- scripts/README.md
- benches/PERFORMANCE_RESULTS.md
…uration - The `ci_performance_suite` now dynamically generates benchmark IDs, reducing maintenance overhead and improving consistency. It also leverages pre-computed adversarial seeds for stable and reproducible performance measurements across various operations and dimensions. - The `profiling_suite` gains support for adversarial point distributions in validation benchmarks, using a robust seed search to ensure valid triangulations for complex cases. - Profiling environment metadata capture is refactored into a reusable script, simplifying workflow definitions in GitHub Actions. - The `justfile` separates `SlotMap` backend linting into a dedicated `just check-storage-backends` command, providing a focused compatibility canary. - The `just profile` command's cleanup logic is made more robust, ensuring temporary worktree removal in all scenarios.
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
benches/profiling_suite.rs (1)
488-492:⚠️ Potential issue | 🟡 MinorConfigured percentile is not reflected in the summary label.
You made percentile configurable here, but Line 523 still prints a hardcoded “95th percentile” label, which becomes incorrect when
BENCH_PERCENTILEis overridden.💡 Minimal fix
- println!( - "95th percentile bytes: {} ({:.2} MB)", - percentile_95, - percentile_95 as f64 / (1024.0 * 1024.0) - ); + println!( + "Configured percentile bytes: {} ({:.2} MB)", + percentile_95, + percentile_95 as f64 / (1024.0 * 1024.0) + );🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@benches/profiling_suite.rs` around lines 488 - 492, The summary label still uses a hardcoded “95th percentile” string even though you parse BENCH_PERCENTILE into the percentile variable; update the print/format call that emits the summary label (where the "95th percentile" text is generated) to use the percentile variable instead, formatting it as an ordinal (e.g., "{percentile}th percentile" or compute "st/nd/rd/th" if desired) so the label reflects the configured value from percentile; ensure any tests or expected strings are adjusted accordingly.benches/ci_performance_suite.rs (1)
304-309:⚠️ Potential issue | 🟡 MinorGate benchmark diagnostics with
bench-loggingand keep measured closures log-free.Current diagnostics are runtime-env gated, not feature gated, and
error!is still reachable insideb.iterclosures. Please gate benchmark diagnostics with#[cfg(feature = "bench-logging")]and remove logging calls from measured closures.💡 Suggested direction
-fn bench_logging_enabled() -> bool { - env::var("DELAUNAY_BENCH_LOG").is_ok_and(|value| value != "0") -} +#[cfg(feature = "bench-logging")] +fn bench_logging_enabled() -> bool { + env::var("DELAUNAY_BENCH_LOG").is_ok_and(|value| value != "0") +} + +#[cfg(not(feature = "bench-logging"))] +const fn bench_logging_enabled() -> bool { + false +}- warn!( ... ); + if bench_logging_enabled() { + warn!( ... ); + }- if bench_logging_enabled() { - error!( ... ); - } panic!( ... );As per coding guidelines, "Gate non-essential test/benchmark diagnostics behind feature flags using test-debug for test diagnostics and bench-logging for benchmark diagnostics in Rust code" and "Never log inside hot benchmark loops or Criterion-measured closures; emit setup/summary diagnostics outside the measured path instead in Rust benchmarks".
Also applies to: 427-432, 701-707, 839-848, 889-897
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@benches/ci_performance_suite.rs` around lines 304 - 309, Gate all benchmark diagnostic logging behind the "bench-logging" feature and remove logs from measured closures: wrap the diagnostic blocks that call warn!, error!, or similar (the warn! in the shown snippet and the other occurrences at the reported ranges) with #[cfg(feature = "bench-logging")] (or put the logging in a small helper function annotated with that attribute) and move any logging that currently lives inside Criterion-measured closures (the closures passed to b.iter) to run once before or after b.iter; ensure the code inside b.iter contains no calls to warn!/error! or other logging macros.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/benchmark_utils.py`:
- Around line 900-905: The _ci_suite_dimension function's regex doesn't match
dimensions preceded by a slash (e.g., "boundary_facets/2d/1000") because the
leading anchor only allows start or underscore; update the pattern to accept
slash as a valid prefix by changing the leading non-capturing group to include
"/" (so it becomes (?:^|_|/)), keep the rest of the pattern (capture (\d+)d and
allow trailing _ or / or end), and ensure the function still returns
f"{match.group(1)}D" or "n/a" as before; search for _ci_suite_dimension and
benchmark_id to locate the change.
- Around line 492-494: Narrow the broad except in the ci_performance_suite
runner by catching the specific errors that run_cargo_command() can raise:
replace the generic "except Exception as e:" around the run_cargo_command() call
with "except ExecutableNotFoundError as e:", "except subprocess.TimeoutExpired
as e:", and "except OSError as e:" (do not add subprocess.CalledProcessError
since run_cargo_command() isn't called with check=True); also add "from
subprocess_utils import ExecutableNotFoundError" to the imports at the top of
the file and keep the existing error logging/return False behavior for each case
so failures remain visible.
---
Outside diff comments:
In `@benches/ci_performance_suite.rs`:
- Around line 304-309: Gate all benchmark diagnostic logging behind the
"bench-logging" feature and remove logs from measured closures: wrap the
diagnostic blocks that call warn!, error!, or similar (the warn! in the shown
snippet and the other occurrences at the reported ranges) with #[cfg(feature =
"bench-logging")] (or put the logging in a small helper function annotated with
that attribute) and move any logging that currently lives inside
Criterion-measured closures (the closures passed to b.iter) to run once before
or after b.iter; ensure the code inside b.iter contains no calls to warn!/error!
or other logging macros.
In `@benches/profiling_suite.rs`:
- Around line 488-492: The summary label still uses a hardcoded “95th
percentile” string even though you parse BENCH_PERCENTILE into the percentile
variable; update the print/format call that emits the summary label (where the
"95th percentile" text is generated) to use the percentile variable instead,
formatting it as an ordinal (e.g., "{percentile}th percentile" or compute
"st/nd/rd/th" if desired) so the label reflects the configured value from
percentile; ensure any tests or expected strings are adjusted accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 951fdaa4-7c60-4b04-bc0c-23f45fe64b2b
📒 Files selected for processing (7)
.github/workflows/profiling-benchmarks.ymlbenches/ci_performance_suite.rsbenches/profiling_suite.rsdocs/dev/commands.mdjustfilescripts/benchmark_utils.pyscripts/ci/capture_profiling_metadata.sh
✅ Files skipped from review due to trivial changes (1)
- docs/dev/commands.md
- Upgrade `uv` Python package manager to version `0.11.8` across all GitHub Actions workflows for consistent environments. - Make internal benchmark logging conditional on the `bench-logging` feature in `ci_performance_suite`, reducing default log verbosity. - Enhance memory profiling in `profiling_suite` with configurable percentile calculation for allocation summaries, improving analysis flexibility. - Improve error handling and reporting in Python benchmark execution scripts for greater reliability and clearer failure feedback.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
benches/profiling_suite.rs (1)
129-140:⚠️ Potential issue | 🟠 MajorDon't print fake allocation statistics in the default bench configuration.
When
count-allocationsis off,measure()returns zeros,print_alloc_banner_once()stays silent unlessbench-loggingis also on, and the suite still emits the full memory summary. That makes the defaultmemory_profilingoutput look real even though it's placeholder data. Please either skip this summary when allocation counting is disabled, or make the placeholder status explicit without depending onbench-logging.As per coding guidelines, "Gate non-essential test/benchmark diagnostics behind feature flags using test-debug for test diagnostics and bench-logging for benchmark diagnostics in Rust code".
Also applies to: 520-553, 652-655
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@benches/profiling_suite.rs` around lines 129 - 140, The benchmark prints a full memory summary with fake zeros when the feature "count-allocations" is disabled; update the code so when count-allocations is not enabled (i.e., in the #[cfg(not(feature = "count-allocations"))] branch) you either skip emitting the memory summary entirely or emit an explicit, unconditional placeholder message (not gated on bench-logging) so consumers know the stats are not real. Locate the measure function and the print_alloc_banner_once function and change the summary-emission call sites (where AllocationInfo::default() results are printed) to check cfg!(feature = "count-allocations") and if false either return early (no summary) or call print_alloc_banner_once_unconditional/print_placeholder_message to log a clear "allocation stats unavailable: count-allocations feature disabled" message; ensure the gating follows the guideline to not rely on bench-logging for this placeholder behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/benchmark_utils.py`:
- Around line 104-131: The new CI suite IDs defined in
CI_PERFORMANCE_SUITE_GROUPS (and its order CI_PERFORMANCE_SUITE_GROUP_ORDER) are
only used for markdown, so update the baseline/regression pipeline by extending
CriterionParser.find_criterion_results to emit and compare the expanded suite
IDs (e.g., "boundary_facets/boundary_facets_3d_adversarial/50") instead of
collapsing to the old "{points}_{dimension}" keys; specifically, map parsed
markdown/group names from CI_PERFORMANCE_SUITE_GROUPS into the baseline key
generation and comparison logic inside find_criterion_results (and any baseline
serialization/deserialization paths it uses) so baseline generation and
regression comparisons use the same expanded identifiers and preserve distinct
groups per dimension/input.
---
Outside diff comments:
In `@benches/profiling_suite.rs`:
- Around line 129-140: The benchmark prints a full memory summary with fake
zeros when the feature "count-allocations" is disabled; update the code so when
count-allocations is not enabled (i.e., in the #[cfg(not(feature =
"count-allocations"))] branch) you either skip emitting the memory summary
entirely or emit an explicit, unconditional placeholder message (not gated on
bench-logging) so consumers know the stats are not real. Locate the measure
function and the print_alloc_banner_once function and change the
summary-emission call sites (where AllocationInfo::default() results are
printed) to check cfg!(feature = "count-allocations") and if false either return
early (no summary) or call
print_alloc_banner_once_unconditional/print_placeholder_message to log a clear
"allocation stats unavailable: count-allocations feature disabled" message;
ensure the gating follows the guideline to not rely on bench-logging for this
placeholder behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: a3fe1711-4067-48fc-ad72-13f2fd3bd02f
📒 Files selected for processing (7)
.github/workflows/benchmarks.yml.github/workflows/ci.yml.github/workflows/generate-baseline.ymlbenches/ci_performance_suite.rsbenches/profiling_suite.rsscripts/benchmark_utils.pyscripts/tests/test_benchmark_utils.py
✅ Files skipped from review due to trivial changes (3)
- .github/workflows/ci.yml
- .github/workflows/benchmarks.yml
- .github/workflows/generate-baseline.yml
🚧 Files skipped from review as they are similar to previous changes (1)
- benches/ci_performance_suite.rs
Adopt explicit `benchmark_id` fields in Python scripts to enable more robust identification, storage, and comparison of benchmark results. This is critical for handling complex benchmark suites, such as `ci_performance_suite`, which may not always have a simple `points` based dimension, or which may have multiple benchmarks for the same points/dimension combination. Baseline files and markdown reports now support these expanded IDs. Additionally, internal memory profiling in `profiling_suite.rs` is made conditional on the `count-allocations` feature. When the feature is not enabled, allocation tracking structures and logic are omitted or reduced to no-ops, reducing compilation overhead and clarifying the intent of memory statistics.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
scripts/benchmark_utils.py (1)
1573-1616:⚠️ Potential issue | 🟠 MajorDon't serialize non-size CI cases as
0 Points.When
pointsisNonefor the new non-numeric IDs (for example the bistellar-flip.../roundtripcases), Line 1602 coerces it to0. That leaks into baseline/comparison output as fake=== 0 Points (...) ===entries and then gets sorted/rendered like a real workload. This needs an explicit non-numeric input field, orpointsneeds to stay optional end-to-end, instead of using a sentinel zero.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/benchmark_utils.py` around lines 1573 - 1616, The code currently coerces a missing points value to 0 via BenchmarkData(points or 0, dimension) which creates bogus "0 Points" workloads; update parse_estimates_json to preserve points being None by constructing the BenchmarkData with the raw optional points (i.e., BenchmarkData(points, dimension)) instead of coercing to 0, and ensure the BenchmarkData class/constructor (and any serialization logic) accepts Optional[int] for its size field and renders a non-numeric/empty label when points is None; keep the existing conditional throughput block (it already guards on points is not None) and only compute/attach throughput when points is present.benches/profiling_suite.rs (1)
969-980:⚠️ Potential issue | 🟠 MajorClamp
BENCH_SAMPLE_SIZEto the Criterion minimum before passing it.The example configuration shows
BENCH_SAMPLE_SIZE=5, but Criterion 0.8.2 requiressample_size ≥ 10and panics if passed a smaller value. Users following the documented example will encounter a runtime panic before the benchmark starts.Suggested fix
- let sample_size = env::var("BENCH_SAMPLE_SIZE") + let sample_size = env::var("BENCH_SAMPLE_SIZE") .ok() .and_then(|v| v.parse().ok()) + .map(|v: usize| v.max(10)) .unwrap_or(10);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@benches/profiling_suite.rs` around lines 969 - 980, The BENCH_SAMPLE_SIZE environment value is passed directly into Criterion::default().sample_size(sample_size) but Criterion 0.8.2 requires a minimum of 10, causing a panic for smaller values; update the logic that computes sample_size (the let sample_size = ... binding) to clamp or max the parsed value to 10 before calling Criterion::default().sample_size(...), e.g. parse the env var as currently done and then replace the unwrap_or with something that ensures sample_size >= 10 so Criterion::sample_size always receives at least 10.
🧹 Nitpick comments (3)
scripts/benchmark_utils.py (1)
1621-1639: Centralize the CI-suite ID parsing helpers.These
_ci_suite_group_key/_ci_suite_dimensionhelpers now exist in bothPerformanceSummaryGeneratorandCriterionParser. That makes it easy for summary rendering and baseline parsing to drift on the same benchmark IDs. Please lift them to shared module-level helpers and reuse them from both call sites. As per coding guidelines,scripts/*.py: "Focus on code quality, maintainability, and adherence to Python best practices."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/benchmark_utils.py` around lines 1621 - 1639, The two methods _ci_suite_group_key and _ci_suite_dimension should be extracted from PerformanceSummaryGenerator into module-level functions (e.g., ci_suite_group_key and ci_suite_dimension) in scripts/benchmark_utils.py and the duplicate copies removed from CriterionParser; update both call sites to call the new module-level helpers (referencing the original method names _ci_suite_group_key/_ci_suite_dimension for locating them and the classes PerformanceSummaryGenerator and CriterionParser for where to change calls), keep the same behavior and signatures (return types and regex) and add a short docstring for each shared function to explain purpose.scripts/tests/test_benchmark_utils.py (2)
2288-2292: Optional: centralize section-title literals used across markdown assertions.You now assert multiple related headings in several tests; using shared constants would make title changes less brittle.
Also applies to: 2342-2342, 2836-2837
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/tests/test_benchmark_utils.py` around lines 2288 - 2292, Centralize the repeated markdown heading literals by defining shared constants (e.g., PUBLIC_API_TITLE = "### Public API Performance Contract (`ci_performance_suite`)", CIRCUMSPHERE_TITLE = "## Circumsphere Predicate Analysis", PERFORMANCE_RANKING_TITLE = "### Performance Ranking", RECOMMENDATIONS_TITLE = "### Recommendations", PERFORMANCE_UPDATES_TITLE = "## Performance Data Updates") at the top of scripts/tests/test_benchmark_utils.py and replace the inline string assertions that reference those exact literals with the corresponding constants; also update the other occurrences noted (near the other assertion locations) to use the same constants so future heading changes only require updating one definition.
266-279: Consider extracting duplicatedwrite_estimatetest helper.The same nested helper appears twice. Pulling it into a shared module-level helper/fixture would reduce maintenance drift.
Also applies to: 2299-2312
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/tests/test_benchmark_utils.py` around lines 266 - 279, Two identical nested helpers named write_estimate are duplicated in the tests; extract that logic into a single module-level helper or pytest fixture (e.g., a top-level function write_estimate(target_dir, path_parts, mean_ns) or a fixture like write_estimate_fixture) and replace both nested definitions with calls to the common helper; update usages in the tests that currently call the nested write_estimate (the copies shown and the one at lines ~2299-2312) so they import or reference the new module-level function/fixture and keep the same behavior (creating criterion/<path_parts>/base, writing estimates.json with mean.point_estimate and confidence_interval).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@benches/profiling_suite.rs`:
- Around line 579-643: The allocation diagnostics (calls around
print_alloc_banner_once(), the #[cfg(feature = "count-allocations")] block that
computes avg_info using allocation_infos, actual_point_counts, bytes_max_values,
configured_percentile(), calculate_percentile(), and print_alloc_summary()) are
being executed inside the Criterion measured closure; move this entire
diagnostics emission out of b.iter_custom (or any Criterion-measured closure) so
it runs once during setup/teardown instead of per-sample, and gate it behind the
bench-logging feature (e.g., #[cfg(feature = "bench-logging")] around the moved
code) while keeping the computation helpers (configured_percentile,
calculate_percentile) accessible from the new location.
---
Outside diff comments:
In `@benches/profiling_suite.rs`:
- Around line 969-980: The BENCH_SAMPLE_SIZE environment value is passed
directly into Criterion::default().sample_size(sample_size) but Criterion 0.8.2
requires a minimum of 10, causing a panic for smaller values; update the logic
that computes sample_size (the let sample_size = ... binding) to clamp or max
the parsed value to 10 before calling Criterion::default().sample_size(...),
e.g. parse the env var as currently done and then replace the unwrap_or with
something that ensures sample_size >= 10 so Criterion::sample_size always
receives at least 10.
In `@scripts/benchmark_utils.py`:
- Around line 1573-1616: The code currently coerces a missing points value to 0
via BenchmarkData(points or 0, dimension) which creates bogus "0 Points"
workloads; update parse_estimates_json to preserve points being None by
constructing the BenchmarkData with the raw optional points (i.e.,
BenchmarkData(points, dimension)) instead of coercing to 0, and ensure the
BenchmarkData class/constructor (and any serialization logic) accepts
Optional[int] for its size field and renders a non-numeric/empty label when
points is None; keep the existing conditional throughput block (it already
guards on points is not None) and only compute/attach throughput when points is
present.
---
Nitpick comments:
In `@scripts/benchmark_utils.py`:
- Around line 1621-1639: The two methods _ci_suite_group_key and
_ci_suite_dimension should be extracted from PerformanceSummaryGenerator into
module-level functions (e.g., ci_suite_group_key and ci_suite_dimension) in
scripts/benchmark_utils.py and the duplicate copies removed from
CriterionParser; update both call sites to call the new module-level helpers
(referencing the original method names _ci_suite_group_key/_ci_suite_dimension
for locating them and the classes PerformanceSummaryGenerator and
CriterionParser for where to change calls), keep the same behavior and
signatures (return types and regex) and add a short docstring for each shared
function to explain purpose.
In `@scripts/tests/test_benchmark_utils.py`:
- Around line 2288-2292: Centralize the repeated markdown heading literals by
defining shared constants (e.g., PUBLIC_API_TITLE = "### Public API Performance
Contract (`ci_performance_suite`)", CIRCUMSPHERE_TITLE = "## Circumsphere
Predicate Analysis", PERFORMANCE_RANKING_TITLE = "### Performance Ranking",
RECOMMENDATIONS_TITLE = "### Recommendations", PERFORMANCE_UPDATES_TITLE = "##
Performance Data Updates") at the top of scripts/tests/test_benchmark_utils.py
and replace the inline string assertions that reference those exact literals
with the corresponding constants; also update the other occurrences noted (near
the other assertion locations) to use the same constants so future heading
changes only require updating one definition.
- Around line 266-279: Two identical nested helpers named write_estimate are
duplicated in the tests; extract that logic into a single module-level helper or
pytest fixture (e.g., a top-level function write_estimate(target_dir,
path_parts, mean_ns) or a fixture like write_estimate_fixture) and replace both
nested definitions with calls to the common helper; update usages in the tests
that currently call the nested write_estimate (the copies shown and the one at
lines ~2299-2312) so they import or reference the new module-level
function/fixture and keep the same behavior (creating
criterion/<path_parts>/base, writing estimates.json with mean.point_estimate and
confidence_interval).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 9c1b1ad5-8d86-4942-9eae-c1804923cba4
📒 Files selected for processing (5)
benches/profiling_suite.rsscripts/benchmark_models.pyscripts/benchmark_utils.pyscripts/tests/test_benchmark_models.pyscripts/tests/test_benchmark_utils.py
…oling Update benchmark scripts to correctly model and report workloads that do not operate on a numeric input size (e.g., bistellar flips). This involves allowing benchmark data to represent an absent point count and adapting parsing, sorting, and reporting logic. Memory profiling in `profiling_suite.rs` is also refined: allocation statistics are now gated on both `count-allocations` and `bench-logging` features, and reporting logic is restructured for clarity and robustness across benchmark samples. Utility functions were extracted in Python.
|
@coderabbitai full review |
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (1)
scripts/benchmark_utils.py (1)
183-188: Derive non-numeric labels from the ID instead of hard-coded defaults.Right now every non-
adversarialcase becomeswell-conditioned, and every non-numeric tail becomesroundtrip. That matches today's flip benchmarks by coincidence, but it will mislabel the next unsized or non-adversarial contract entry. Prefer parsing the relevant token frombenchmark_id/path_partsand falling back to something neutral liken/awhen no explicit variant exists.Also applies to: 918-923
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@benches/profiling_suite.rs`:
- Around line 118-122: The print_alloc_banner_once function currently prints an
allocation banner unconditionally; gate that non-essential benchmark diagnostic
behind the bench-logging feature so it only prints when bench-logging is enabled
(e.g., surround the println! call or the call_once invocation with cfg(feature =
"bench-logging") or make the closure no-op unless the feature is enabled);
update the same pattern at the other occurrence referenced (lines ~655-656) so
both places only emit the banner when bench-logging is active, keeping the
Once/ONCE usage intact.
In `@docs/dev/commands.md`:
- Around line 106-112: The docs claim "`just check` validates the default
DenseSlotMap backend plus all feature combinations" which is inaccurate; update
the sentence in docs/dev/commands.md to state that "`just check` runs checks for
the default backend (DenseSlotMap) and also performs an all-features pass" (or
similar wording), and explicitly note that the justfile only runs clippy for the
default feature set and for --all-features; preserve the existing guidance to
run the legacy SlotMap backend explicitly with `just check-storage-backends`.
In `@scripts/benchmark_models.py`:
- Around line 29-33: The comparison_key property currently maps all unsized
benchmarks (points is None) in the same dimension to "unsized_<dimension>"
causing collisions; update comparison_key (the property method in the class) so
that if self.points is None it requires a real self.benchmark_id and raises a
clear exception (e.g., ValueError) when benchmark_id is missing, rather than
falling back to "unsized_<dimension>"; this forces callers to supply a unique
benchmark_id for unsized workloads and prevents silent overwrites in
baseline/comparison dicts.
In `@scripts/benchmark_utils.py`:
- Around line 942-1008: The parser _parse_ci_performance_suite_results currently
accepts any estimates.json under self.circumsphere_results_dir and can pick up
stale files; modify it (and the companion _process_ci_performance_suite_results)
to only consider benchmark IDs listed in the CI-suite manifest emitted by
ci_performance_suite.rs: capture the manifest from the benchmark stdout at
run-time and pass it (or store it under a known path) to the code that builds
estimates_by_id, then when iterating over
sorted(criterion_dir.glob("**/estimates.json")) skip any estimates_path whose
relative path parts (the benchmark_id formed by path_parts) are not present in
the manifest set before adding to estimates_by_id; ensure fallback behavior when
no manifest exists remains safe (e.g., return [] or keep current logic) and
update any callers that run the benchmark to write/forward the manifest to the
same results directory so _parse_ci_performance_suite_results and
_process_ci_performance_suite_results can use it.
In `@scripts/ci/capture_profiling_metadata.sh`:
- Around line 35-44: The script echoes GITHUB_REF_NAME and RUNNER_OS without
defaults which will fail under set -u when not running in GitHub Actions; update
the echo lines to use parameter expansion defaults (e.g. replace
${GITHUB_REF_NAME} with ${GITHUB_REF_NAME:-local} and ${RUNNER_OS} with
${RUNNER_OS:-$(uname -s)}) so the script runs locally and in CI without error,
keeping the rest of the variable names (declared_toolchain, rust_version,
benchmark_filter, profiling_mode) unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 47b7fb3e-a41e-4c6c-8ff1-b66ab13ef144
📒 Files selected for processing (23)
.github/workflows/benchmarks.yml.github/workflows/ci.yml.github/workflows/generate-baseline.yml.github/workflows/profiling-benchmarks.ymlCargo.tomlbenches/PERFORMANCE_RESULTS.mdbenches/README.mdbenches/ci_performance_suite.rsbenches/microbenchmarks.rsbenches/profiling_suite.rsdocs/code_organization.mddocs/dev/commands.mdexamples/convex_hull_3d_100_points.rsexamples/pachner_roundtrip_4d.rsjustfilescripts/README.mdscripts/benchmark_models.pyscripts/benchmark_utils.pyscripts/ci/capture_profiling_metadata.shscripts/tests/test_benchmark_models.pyscripts/tests/test_benchmark_utils.pysrc/lib.rstests/prelude_exports.rs
💤 Files with no reviewable changes (3)
- docs/code_organization.md
- Cargo.toml
- benches/microbenchmarks.rs
Introduces a manifest system to accurately track and filter benchmark results from the `ci_performance_suite`. This prevents incorrect performance comparisons due to stale Criterion output directories. Additionally, this change enforces explicit benchmark IDs for unsized workloads to ensure stable comparison keys, improves local profiling metadata capture, and refines conditional allocation logging.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
scripts/benchmark_utils.py (1)
185-199:⚠️ Potential issue | 🟠 MajorFail closed when a successful CI-suite run does not yield a manifest.
If
_parse_ci_performance_manifest_ids(stdout)returns nothing here, the sidecar file is removed and both CI parsers later fall back to scanning every matchingestimates.json. Incompare_with_baseline()and dev baseline generation, that reintroduces the stale-results problem this manifest was added to prevent. Please make manifest extraction a required postcondition for successfulci_performance_suiteruns instead of silently dropping back to unfiltered discovery.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/benchmark_utils.py` around lines 185 - 199, In _write_ci_performance_manifest_ids ensure manifest extraction is a required postcondition for successful ci_performance_suite runs: when _parse_ci_performance_manifest_ids(stdout) returns empty, do not unlink the manifest or silently fall back; instead raise a clear exception (e.g., RuntimeError) that includes context (stdout and that ci_performance_suite ran) so callers like compare_with_baseline and dev baseline generation can fail fast; keep writing behavior when manifest_ids is present and still create criterion_dir and write the sorted manifest_ids as before.
🧹 Nitpick comments (2)
benches/profiling_suite.rs (1)
48-48: Documentation could clarify the minimum sample size.Line 974 clamps
BENCH_SAMPLE_SIZEto a minimum of 10 viasize.max(10), but the documentation at line 48 only states "(default: 10)" without mentioning this floor. Users settingBENCH_SAMPLE_SIZE=5would get 10 samples, which could be unexpected.📝 Suggested documentation update
-//! - `BENCH_SAMPLE_SIZE`: Override Criterion sample size (default: 10) +//! - `BENCH_SAMPLE_SIZE`: Override Criterion sample size (default: 10, minimum: 10)Also applies to: 971-974
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@benches/profiling_suite.rs` at line 48, The docs mention BENCH_SAMPLE_SIZE defaulting to 10 but omit that the code enforces a minimum; update the comment/docstring around BENCH_SAMPLE_SIZE to state that values below 10 are clamped to 10 (the clamp is implemented via size.max(10) in the profiling logic), so clarify that setting BENCH_SAMPLE_SIZE=5 will result in 10 samples and reference the clamping behavior in the text.scripts/benchmark_utils.py (1)
1005-1076: Deduplicate the CI-suite Criterion discovery path.
_parse_ci_performance_suite_results()andCriterionParser._process_ci_performance_suite_results()now carry near-identical manifest filtering,base/newpreference, ID reconstruction, and sorting logic. This is exactly the kind of code that will drift the next time the manifest or ID format changes. Pulling the shared walk/filter step into one helper would make future fixes much safer.Also applies to: 1697-1748
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/benchmark_utils.py` around lines 1005 - 1076, The two functions _parse_ci_performance_suite_results and CriterionParser._process_ci_performance_suite_results duplicate the Criterion discovery/walk, manifest filtering, base/new preference, ID reconstruction and group filtering; extract that shared logic into a helper (e.g., _collect_ci_suite_estimates or similar) which takes the criterion_dir and manifest_ids (using _load_ci_performance_manifest_ids) and returns a deduplicated list of (path_parts, estimates_path) or a dict equivalent after applying ci_suite_group_key and choosing "new" over "base"; then have both _parse_ci_performance_suite_results and CriterionParser._process_ci_performance_suite_results call this helper and keep only their result-parsing and sorting (use CI_PERFORMANCE_SUITE_GROUP_ORDER and the existing dimension/input_size parsing in each caller).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/benchmark_utils.py`:
- Around line 545-561: The helper _run_ci_performance_suite currently always
appends DEV_MODE_BENCH_ARGS making trusted-profile runs still use dev sampling;
change the function signature to accept an explicit boolean (e.g., use_dev_mode:
bool = False) or a sampling_mode flag, and build cargo_args so that
DEV_MODE_BENCH_ARGS are only extended when use_dev_mode is true; keep
TRUSTED_BENCH_PROFILE logic as-is but omit DEV_MODE_BENCH_ARGS by default so
full/trusted runs produce proper full-sampling Criterion data (update callers to
pass True where dev mode is intended).
---
Duplicate comments:
In `@scripts/benchmark_utils.py`:
- Around line 185-199: In _write_ci_performance_manifest_ids ensure manifest
extraction is a required postcondition for successful ci_performance_suite runs:
when _parse_ci_performance_manifest_ids(stdout) returns empty, do not unlink the
manifest or silently fall back; instead raise a clear exception (e.g.,
RuntimeError) that includes context (stdout and that ci_performance_suite ran)
so callers like compare_with_baseline and dev baseline generation can fail fast;
keep writing behavior when manifest_ids is present and still create
criterion_dir and write the sorted manifest_ids as before.
---
Nitpick comments:
In `@benches/profiling_suite.rs`:
- Line 48: The docs mention BENCH_SAMPLE_SIZE defaulting to 10 but omit that the
code enforces a minimum; update the comment/docstring around BENCH_SAMPLE_SIZE
to state that values below 10 are clamped to 10 (the clamp is implemented via
size.max(10) in the profiling logic), so clarify that setting
BENCH_SAMPLE_SIZE=5 will result in 10 samples and reference the clamping
behavior in the text.
In `@scripts/benchmark_utils.py`:
- Around line 1005-1076: The two functions _parse_ci_performance_suite_results
and CriterionParser._process_ci_performance_suite_results duplicate the
Criterion discovery/walk, manifest filtering, base/new preference, ID
reconstruction and group filtering; extract that shared logic into a helper
(e.g., _collect_ci_suite_estimates or similar) which takes the criterion_dir and
manifest_ids (using _load_ci_performance_manifest_ids) and returns a
deduplicated list of (path_parts, estimates_path) or a dict equivalent after
applying ci_suite_group_key and choosing "new" over "base"; then have both
_parse_ci_performance_suite_results and
CriterionParser._process_ci_performance_suite_results call this helper and keep
only their result-parsing and sorting (use CI_PERFORMANCE_SUITE_GROUP_ORDER and
the existing dimension/input_size parsing in each caller).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 83e0a23f-1615-4ae0-abb6-9abdcd6ff6dd
📒 Files selected for processing (7)
benches/profiling_suite.rsdocs/dev/commands.mdscripts/benchmark_models.pyscripts/benchmark_utils.pyscripts/ci/capture_profiling_metadata.shscripts/tests/test_benchmark_models.pyscripts/tests/test_benchmark_utils.py
🚧 Files skipped from review as they are similar to previous changes (2)
- docs/dev/commands.md
- scripts/tests/test_benchmark_utils.py
This commit introduces comprehensive improvements to the Python scripts for benchmarking, profiling, and hardware analysis. Key changes focus on improving code robustness, type safety, and overall maintainability. - **New Development Guidelines:** Adds `docs/dev/python.md` to formalize best practices for Python automation, guiding developers on rigorous typing, structured subprocess mocking, and precise exception handling. - **Stricter Linting and Typing:** Updates `pyproject.toml` to enable additional Ruff rules for type annotation coverage (`ANN201`, `ANN202`, `ANN204`) and re-enable `BLE001` (blind exception catch), applying these quality checks across all Python scripts. - **Robust Exception Handling:** Replaces broad `except Exception` clauses with specific, recoverable error families (`_RECOVERABLE_CLI_ERRORS`) within core benchmark, hardware, and comparison utilities, preventing silent failures and improving debuggability. - **Typed Subprocess Mocks:** Updates test fixtures to return fully typed `subprocess.CompletedProcess` objects, aligning with new mocking guidance and enhancing test reliability for command-line interactions. - **Internal Refactorings:** Extracts and clarifies helper functions in `benchmark_utils.py`, `hardware_utils.py`, and `postprocess_changelog.py` to improve code readability and reusability. - **Dev Mode Benchmark Options:** Enables `ci_performance_suite` to conditionally apply reduced Criterion sampling arguments for faster local development feedback. - **Benchmark Config Clarification:** Adds a note to `profiling_suite.rs` clarifying Criterion's `BENCH_SAMPLE_SIZE` clamping behavior.
There was a problem hiding this comment.
🧹 Nitpick comments (2)
scripts/compare_storage_backends.py (1)
47-55: KeepKeyErrorandTypeErrorout of the top-level recoverable set.Those two now get swallowed by
run_comparison(),_run_benchmark(), andmain(), so a broken result shape or report-generation bug is downgraded to a generic CLI failure instead of surfacing the actual programming error. The JSON parsing path already handles malformed data locally, which is the right place for these if you really need them.Suggested fix
_RECOVERABLE_COMPARISON_ERRORS: tuple[type[BaseException], ...] = ( ExecutableNotFoundError, OSError, RuntimeError, - TypeError, ValueError, - KeyError, subprocess.SubprocessError, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/compare_storage_backends.py` around lines 47 - 55, The top-level recoverable set _RECOVERABLE_COMPARISON_ERRORS currently includes KeyError and TypeError which causes programming errors to be swallowed inside run_comparison, _run_benchmark, and main; remove KeyError and TypeError from the tuple so those exceptions propagate normally and are not downgraded to generic CLI failures, leaving any localized JSON/parsing handlers to catch malformed data where appropriate (verify JSON parsing code still handles KeyError/TypeError locally).scripts/hardware_utils.py (1)
162-171: Fall back to/proc/cpuinfowhenlscpuis present but unusable.Right now any execution/parsing failure in
_get_linux_cpu_cores_from_lscpu()returns"Unknown"and stops the lookup, even though/proc/cpuinfomay still have usable data. That regresses reliability on stripped containers and odd distros wherelscpuexists but its output is incomplete.Suggested fix
def _get_linux_cpu_cores(self) -> str: """ Get CPU core count on Linux. Returns: CPU core count or "Unknown" """ - if not shutil.which("lscpu"): - return self._get_linux_cpu_cores_from_proc() - return self._get_linux_cpu_cores_from_lscpu() + if shutil.which("lscpu"): + cores = self._get_linux_cpu_cores_from_lscpu() + if cores != "Unknown": + return cores + return self._get_linux_cpu_cores_from_proc()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/hardware_utils.py` around lines 162 - 171, The current _get_linux_cpu_cores returns "Unknown" whenever _get_linux_cpu_cores_from_lscpu fails, preventing a fallback to /proc; modify _get_linux_cpu_cores so it first checks shutil.which("lscpu") and if present calls _get_linux_cpu_cores_from_lscpu but catches any exception or an "Unknown" result and then calls _get_linux_cpu_cores_from_proc as a fallback; ensure exceptions from _get_linux_cpu_cores_from_lscpu are handled (no crash) and the final returned string comes from _get_linux_cpu_cores_from_proc when lscpu output is unusable.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@scripts/compare_storage_backends.py`:
- Around line 47-55: The top-level recoverable set
_RECOVERABLE_COMPARISON_ERRORS currently includes KeyError and TypeError which
causes programming errors to be swallowed inside run_comparison, _run_benchmark,
and main; remove KeyError and TypeError from the tuple so those exceptions
propagate normally and are not downgraded to generic CLI failures, leaving any
localized JSON/parsing handlers to catch malformed data where appropriate
(verify JSON parsing code still handles KeyError/TypeError locally).
In `@scripts/hardware_utils.py`:
- Around line 162-171: The current _get_linux_cpu_cores returns "Unknown"
whenever _get_linux_cpu_cores_from_lscpu fails, preventing a fallback to /proc;
modify _get_linux_cpu_cores so it first checks shutil.which("lscpu") and if
present calls _get_linux_cpu_cores_from_lscpu but catches any exception or an
"Unknown" result and then calls _get_linux_cpu_cores_from_proc as a fallback;
ensure exceptions from _get_linux_cpu_cores_from_lscpu are handled (no crash)
and the final returned string comes from _get_linux_cpu_cores_from_proc when
lscpu output is unusable.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 0ff11e9e-f1eb-4fdd-88b3-718f349fc796
📒 Files selected for processing (14)
benches/profiling_suite.rsdocs/dev/python.mdpyproject.tomlscripts/benchmark_utils.pyscripts/compare_storage_backends.pyscripts/hardware_utils.pyscripts/postprocess_changelog.pyscripts/tests/conftest.pyscripts/tests/test_benchmark_models.pyscripts/tests/test_benchmark_utils.pyscripts/tests/test_compare_storage_backends.pyscripts/tests/test_hardware_utils.pyscripts/tests/test_subprocess_utils.pyscripts/tests/test_tag_release.py
✅ Files skipped from review due to trivial changes (4)
- scripts/tests/test_tag_release.py
- docs/dev/python.md
- scripts/tests/test_compare_storage_backends.py
- scripts/tests/test_subprocess_utils.py
🚧 Files skipped from review as they are similar to previous changes (3)
- scripts/tests/test_benchmark_models.py
- benches/profiling_suite.rs
- scripts/benchmark_utils.py
Summary
just profilesupport for comparing code refs and compiler versionsci_performance_suitebeyond construction to cover hulls, boundary traversal, validation, adversarial inputs, and bistellar flipsBenchmark Contract Notes
This PR intentionally prioritizes the benchmark contract going forward over compatibility with older tags. Local profiling now uses
cargo bench --profile perfso current and future runs match CI/performance baselines, even if some older tags predate theperfprofile.Existing benchmark IDs are preserved where possible. New adversarial cases are added as
_adversarialsiblings, so existing baseline rows continue to line up while robust-input coverage appears as explicit new rows.The
api_benchmark_manifestoutput is log-readable and versioned withschema=1. That gives each benchmark run a compact record of the public API surface, dimensions, and Criterion IDs covered by that crate version.Validation
cargo bench --profile perf --bench ci_performance_suite --no-runcargo bench --profile perf --bench ci_performance_suite -- --testcargo bench --profile perf --bench ci_performance_suite -- boundary_facetsjust check