Skip to content

perf: expand profiling benchmarks around public API workflows#349

Merged
acgetchell merged 8 commits intomainfrom
perf/bench-profile-toolchains
Apr 28, 2026
Merged

perf: expand profiling benchmarks around public API workflows#349
acgetchell merged 8 commits intomainfrom
perf/bench-profile-toolchains

Conversation

@acgetchell
Copy link
Copy Markdown
Owner

@acgetchell acgetchell commented Apr 27, 2026

Summary

  • Run profiling comparisons with the checked-out crate toolchain by default
  • Add local just profile support for comparing code refs and compiler versions
  • Expand ci_performance_suite beyond construction to cover hulls, boundary traversal, validation, adversarial inputs, and bistellar flips
  • Emit a versioned API benchmark manifest so benchmark logs show which public workflows were measured

Benchmark Contract Notes

This PR intentionally prioritizes the benchmark contract going forward over compatibility with older tags. Local profiling now uses cargo bench --profile perf so current and future runs match CI/performance baselines, even if some older tags predate the perf profile.

Existing benchmark IDs are preserved where possible. New adversarial cases are added as _adversarial siblings, so existing baseline rows continue to line up while robust-input coverage appears as explicit new rows.

The api_benchmark_manifest output is log-readable and versioned with schema=1. That gives each benchmark run a compact record of the public API surface, dimensions, and Criterion IDs covered by that crate version.

Validation

  • cargo bench --profile perf --bench ci_performance_suite --no-run
  • cargo bench --profile perf --bench ci_performance_suite -- --test
  • cargo bench --profile perf --bench ci_performance_suite -- boundary_facets
  • just check

- Run profiling comparisons with the checked-out crate toolchain by default
- Add local `just profile` support for comparing code refs and compiler versions
- Expand `ci_performance_suite` beyond construction to cover hulls, boundary traversal, validation, and bistellar flips
- Emit a versioned API benchmark manifest so benchmark logs show which public workflows were measured
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 27, 2026

Walkthrough

Adds a CI-focused public API benchmarking suite and manifest, reworks local profiling tooling (just recipe) to support parameterized toolchains/worktrees, captures per-run profiling environment metadata to profiling-results/environment_metadata.md, extends benchmark parsing/summary tooling and tests to support unsized/ID'd benchmarks, exposes a prelude::generators module, removes legacy microbenchmarks, and pins/adjusts CI workflow tool versions and artifact handling.

Changes

Cohort / File(s) Summary
CI Workflows & Artifact Handling
\.github/workflows/profiling-benchmarks.yml, \.github/workflows/benchmarks.yml, \.github/workflows/ci.yml, \.github/workflows/generate-baseline.yml
Remove explicit Rust toolchain pin in profiling workflow, add metadata capture step, pin uv via UV_VERSION, and ensure profiling-results/ artifacts (including environment_metadata.md) are produced/uploaded.
Profiling Metadata Script
scripts/ci/capture_profiling_metadata.sh
New script generating profiling-results/environment_metadata.md with toolchain/channel, rustc/cargo versions, commit/branch, cargo profile, benchmark filter, and profiling mode.
Local Profile Recipe & Help
justfile
Replace fixed profile recipe with parameterized profile toolchain="" code_ref="current": that can install a toolchain and create a worktree; add check-storage-backends recipe and update help text.
CI Public API Benchmarks & Reports
benches/ci_performance_suite.rs, benches/PERFORMANCE_RESULTS.md, benches/README.md
Introduce ci_performance_suite covering construction, boundary facets, convex hull, validation, insertion, and bistellar flips; emit runtime manifest, update performance report and docs to include the new public API contract.
Profiling & Validation Bench Suite
benches/profiling_suite.rs
Add Validation Layer Diagnostics, adversarial seed discovery, Adversarial generator, memory-allocation counting refactor, and rename/register benchmark groups.
Removed Microbenchmarks
benches/microbenchmarks.rs, Cargo.toml (bench entry)
Remove benches/microbenchmarks.rs and delete [[bench]] manifest entry microbenchmarks.
Benchmark Summary Tooling & Tests
scripts/benchmark_utils.py, scripts/tests/test_benchmark_utils.py, scripts/benchmark_models.py, scripts/tests/test_benchmark_models.py
Add CI-suite runner/manifest emission and parsing, prefer manifest-listed estimates.json, support Benchmark ID/unsized workloads via comparison_key, extend baseline/markdown rendering, and add corresponding tests.
Prelude & Library Surface
src/lib.rs, tests/prelude_exports.rs
Add pub mod prelude::generators re-exports, relax is_normal signature (drop Sized bound), update docs, and add a prelude smoke test.
Examples & Small Refactors
examples/...
examples/convex_hull_3d_100_points.rs, examples/pachner_roundtrip_4d.rs
Update imports to delaunay::prelude::generators, rename small helpers and imports; no behavioral changes.
Python Scripts & Tests Typing / Error Handling
scripts/*.py, scripts/tests/* (e.g., hardware_utils.py, compare_storage_backends.py, tests/*)
Tighten exception handling to recoverable error sets, add explicit return/type annotations in scripts and tests, refactor helpers, and improve subprocess mocking/fixtures.
Tooling & Linting Config
pyproject.toml, docs/dev/python.md
Update Ruff rules (enable ANN201/202/204, C90), set max-complexity, and add Python developer guidance for typing, testing, and subprocess mocking.
Docs & Release Guidance
scripts/README.md, docs/code_organization.md, docs/dev/commands.md, benches/README.md
Document CI performance suite as primary summary source, remove microbenchmarks from listings, add just check-storage-backends guidance, and update benchmark generation instructions.

Sequence Diagram(s)

sequenceDiagram
    participant Dev as Developer / justfile
    participant CI as GitHub Actions
    participant Git as Git (checkout / worktree)
    participant Cargo as rustup / cargo
    participant Script as capture_metadata.sh
    participant FS as Filesystem / Artifacts

    Dev->>CI: push / trigger profiling workflow
    CI->>Git: checkout repo (optional worktree for code_ref)
    CI->>Cargo: setup Rust toolchain (rust-toolchain.toml or requested)
    CI->>Cargo: run `cargo bench --profile perf --bench ci_performance_suite`
    Cargo->>FS: emit `target/criterion/**` and `profiling-results/` artifacts
    CI->>Script: run `./scripts/ci/capture_profiling_metadata.sh` (BENCH_FILTER / PROFILE_METADATA_MODE)
    Script->>FS: write `profiling-results/environment_metadata.md`
    CI->>FS: upload `profiling-results/*` as workflow artifacts
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 I hop through commits, neat and spry,
I capture metadata while benchmarks fly,
Manifests printed and seeds I seek,
Triangles counted, diagnostics speak,
Carrots for CI — a small celebratory nibble!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: expanding profiling benchmarks to cover public API workflows beyond just construction.
Description check ✅ Passed The description clearly relates to the changeset, detailing the key objectives: toolchain handling, local profiling support, expanded benchmark coverage, and manifest emission.
Docstring Coverage ✅ Passed Docstring coverage is 93.50% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch perf/bench-profile-toolchains

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot added enhancement New feature or request rust Pull requests that update rust code labels Apr 27, 2026
@codacy-production
Copy link
Copy Markdown

codacy-production Bot commented Apr 27, 2026

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 34 complexity

Metric Results
Complexity 34

View in Codacy

🟢 Coverage 100.00% diff coverage · +0.00% coverage variation

Metric Results
Coverage variation +0.00% coverage variation (-1.00%)
Diff coverage 100.00% diff coverage

View coverage diff in Codacy

Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (6e437fb) 50908 45309 89.00%
Head commit (70703fb) 50908 (+0) 45309 (+0) 89.00% (+0.00%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#349) 4 4 100.00%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/profiling-benchmarks.yml:
- Around line 121-132: The TOML metadata extraction commands (the
declared_toolchain and rust_version command substitutions) are currently fatal
under set -euo pipefail when files or keys are missing; change them to be
best-effort by making the command substitutions non-fatal (e.g., ensure failures
in the pipelines do not cause the script to exit) so missing rust-toolchain.toml
or Cargo.toml simply produce empty/default variables; update both
declared_toolchain and rust_version extractions (and the analogous block around
lines 320-331) to swallow errors or provide a safe fallback so metadata
collection is optional rather than a hard failure.

In `@benches/ci_performance_suite.rs`:
- Around line 336-405: The benchmark is measuring candidate discovery plus flip
cost because roundtrip_k2_4d and roundtrip_k3_4d scan inside the timed closure;
move discovery into the setup closure by extracting functions that find a
flippable handle (e.g. find_flippable_k2_facet_4d and
find_flippable_k3_ridge_4d) using the existing collect_interior_facets_4d and
collect_ridges_4d iteration logic, then change roundtrip_k2_4d and
roundtrip_k3_4d to take the found handle as an argument and only perform
dt.flip_k2 / dt.flip_k3 and their inverse; update the benchmark to clone base_dt
in the setup, call the new finder there to return (dt, facet/ridge), and keep
the timed closure to call the adjusted roundtrip functions so only the flip
roundtrip is measured.
- Around line 210-222: prepare_triangulation currently only generates a single
well-conditioned seeded random triangulation (via prepare_benchmark_data and
BenchTriangulation::new_with_options), so CI benchmarks never exercise
adversarial geometry; update the benchmark data generation to include
adversarial datasets by either extending prepare_triangulation to accept a
mode/seed that constructs adversarial inputs (near-boundary points, cospherical
sets, degenerate simplices, large coordinates) or add a companion function
(e.g., prepare_adversarial_triangulation) that uses prepare_benchmark_data or a
custom generator to inject those edge cases, then update the benchmarks that
call prepare_triangulation to run both the well-conditioned and adversarial
variants (or parameterize over seeds and RetryPolicy::Shuffled base_seed) so
each benchmark exercises both robust and well-conditioned inputs using
BenchTriangulation, ConstructionOptions, and RetryPolicy.

In `@justfile`:
- Around line 440-452: The command-substitution that sets requested_toolchain
should be made tolerant to failures so set -euo pipefail doesn't abort early:
modify the subshell/pipeline that greps rust-toolchain.toml for channel to
suppress non-zero exit (e.g., append a fallback like "|| true" to the pipeline
or to the grep) so it returns an empty string instead of failing, then keep the
existing if [[ -z "$requested_toolchain" ]] check to show the user-facing error
and exit; target the assignment that populates the requested_toolchain variable
from rust-toolchain.toml/channel.
- Around line 472-490: The profile recipe writes metadata saying "Cargo profile:
cargo bench default" but invokes rustup run "$requested_toolchain" cargo bench
--bench ci_performance_suite without the perf profile; update the invocation of
cargo bench in the block that runs in "$workdir" (the rustup run ... cargo bench
--bench ci_performance_suite command) to include the CI perf profile (add
--profile perf) and also update the metadata written to
"$run_dir/profile_metadata.md" (the "Cargo profile:" line) to reflect "cargo
bench --profile perf" so local runs match CI baselines.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 15aa59a8-db11-48b8-be8d-e6af1a08046d

📥 Commits

Reviewing files that changed from the base of the PR and between 6e437fb and 5f3e029.

📒 Files selected for processing (3)
  • .github/workflows/profiling-benchmarks.yml
  • benches/ci_performance_suite.rs
  • justfile

Comment thread .github/workflows/profiling-benchmarks.yml Outdated
Comment thread benches/ci_performance_suite.rs Outdated
Comment thread benches/ci_performance_suite.rs Outdated
Comment thread justfile
Comment thread justfile
@acgetchell acgetchell enabled auto-merge (squash) April 27, 2026 17:52
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 27, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 88.98%. Comparing base (6e437fb) to head (70703fb).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #349   +/-   ##
=======================================
  Coverage   88.98%   88.98%           
=======================================
  Files          58       58           
  Lines       50713    50713           
=======================================
  Hits        45125    45125           
  Misses       5588     5588           
Flag Coverage Δ
unittests 88.98% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

- Run profiling benchmarks against the selected code and Rust toolchain versions.
- Expand ci_performance_suite around public API workflows, adversarial inputs, incremental insertion, validation, convex hulls, and bistellar flip roundtrips.
- Retire microbenchmarks in favor of CI and profiling suite coverage.
- Update benchmark summary generation and performance results for the expanded CI suite, with circumsphere results grouped separately.
- Add focused prelude exports for generator-heavy doctests, examples, tests, and benchmarks.
@coderabbitai coderabbitai Bot added documentation Improvements or additions to documentation geometry Geometry-related issues api topology labels Apr 27, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
benches/profiling_suite.rs (1)

530-545: ⚠️ Potential issue | 🟠 Major

Keep actual_point_counts bookkeeping out of measure(...).

actual_point_counts.push(points.len()) runs inside the counted closure, so once that SmallBuffer grows past its inline capacity it starts charging benchmark-harness allocations as triangulation allocations. With BENCHMARK_ITERATION_BUFFER_SIZE = 8 and the default Criterion sample size of 10, this skews the memory numbers in the common case.

💡 Suggested fix
                 for _ in 0..iters {
                     let start_time = Instant::now();
+                    let mut actual_count = 0usize;

                     let alloc_info = measure(|| {
                         let points =
                             gen_points::<D>(count, PointDistribution::Random, DEFAULT_SEED);
                         let vertices: Vec<_> = points.iter().map(|p| vertex!(*p)).collect();
-                        actual_point_counts.push(points.len()); // Track actual count
+                        actual_count = points.len();
                         if let Ok(dt) = DelaunayTriangulationBuilder::new(&vertices).build::<()>() {
                             black_box(dt);
                         }
                     });

                     total_time += start_time.elapsed();
                     allocation_infos.push(alloc_info);
+                    actual_point_counts.push(actual_count);
                 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benches/profiling_suite.rs` around lines 530 - 545, The measured closure
currently includes actual_point_counts.push(points.len()), which causes
benchmark allocations to be attributed to the triangulation; to fix, compute and
store the points length outside the measure closure (e.g., let points =
gen_points::<D>(...); let pts_len = points.len(); then call measure with the
triangulation work using points/vertices) and then push pts_len into
actual_point_counts after measure returns; update the block around gen_points,
vertex! mapping, DelaunayTriangulationBuilder::new(...).build::<()>() and
allocation_infos handling so only the triangulation work is measured and
bookkeeping (actual_point_counts.push) occurs outside the measured closure.
🧹 Nitpick comments (5)
benches/ci_performance_suite.rs (3)

412-432: roundtrip_k1_4d picks an arbitrary cell each iteration.

dt.cells().next() returns the first cell from a SlotMap iterator; after flip_k1_insert mutates the topology and flip_k1_remove reverses it, subsequent iterations operate on a freshly cloned base_dt again, so the cell choice is stable across iterations. That's fine — but the choice depends on SlotMap iteration order, which is deterministic for a given map state but not guaranteed to be the centroid-friendly cell. If a future change to the seed/insertion path lands the "first cell" on a sliver, the centroid may fall outside or on a facet and flip_k1_insert will start failing with a panic.

A small hardening: pick the cell with the largest volume (or any deterministic non-degenerate choice) once outside the timed closure and pass its CellKey in via the setup closure.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benches/ci_performance_suite.rs` around lines 412 - 432, roundtrip_k1_4d
currently picks an arbitrary cell via dt.cells().next(), which can pick a
degenerate/sliver cell under different map orders; instead, before the timed
iterations compute a deterministic, non-degenerate CellKey (e.g. iterate all
cells on the cloned base triangulation and pick the one with the largest volume
using cell_centroid_4d/volume helper) and pass that CellKey into the benchmark
closure so roundtrip_k1_4d uses that precomputed CellKey for flip_k1_insert and
flip_k1_remove; modify the setup to produce the chosen CellKey from
FlipTriangulation4 and update roundtrip_k1_4d signature to accept a CellKey (and
use dt.tds().vertex_key_from_uuid/new_uuid logic unchanged).

309-338: Adversarial seed search has no DELAUNAY_BENCH_DISCOVER_SEEDS escape hatch.

prepare_data falls back to find_seed_vertices and uses seed_search_limit() (default 2000) so users can extend the search via DELAUNAY_BENCH_DISCOVER_SEEDS_LIMIT. prepare_adv_data shares the limit but, unlike well-conditioned cases, has no KNOWN_SEEDS table to short-circuit. If the search ever fails for one of the new _adversarial cases, the only recourse will be patching the source — there's no equivalent to the documented DELAUNAY_BENCH_DISCOVER_SEEDS=1 workflow for adversarial inputs.

Consider either (a) adding a KNOWN_ADV_SEEDS table once a stable suite of seeds is established, or (b) extending the discover-seeds hook in benchmark_tds_new_dimension! to also emit the adversarial seed when tds_new_<D>d/tds_new_adversarial/<count> is filtered.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benches/ci_performance_suite.rs` around lines 309 - 338, prepare_adv_data
currently brute-forces seeds with seed_search_limit() and will panic if none
found, but lacks the documented discover-seeds escape hatch; update
prepare_adv_data to (1) consult a new KNOWN_ADV_SEEDS table (map of
dimension+count -> seed) before searching and return the stored
seed/points/vertices if present, and (2) honor the existing
DELAUNAY_BENCH_DISCOVER_SEEDS / DELAUNAY_BENCH_DISCOVER_SEEDS_LIMIT workflow by
checking the env var at runtime: when DELAUNAY_BENCH_DISCOVER_SEEDS is set,
continue the search as now but on a successful candidate from
BenchTriangulation::new_with_options emit a machine-readable line (e.g.
"ADV_SEED D count seed") to stdout or the logger and return instead of panicking
so CI can record the adversarial seed; ensure you still use seed_search_limit()
for the limit and reference prepare_adv_data,
BenchTriangulation::new_with_options, seed_search_limit, and the new
KNOWN_ADV_SEEDS lookup when making changes.

1185-1213: Hoist flip-candidate discovery out of the per-batch setup closure.

flippable_k2_facet_4d / flippable_k3_ridge_4d currently run once per iter_batched batch, linearly scanning all interior facets/ridges and cloning dt for each candidate to trial-flip. Since base_dt is immutable and FacetHandle / RidgeHandle are stable across clone() (SlotMap preserves all key slots and versions exactly), compute the handle once before bench_function and reuse it across batches—eliminating O(N_batches) redundant discovery scans.

♻️ Hoist discovery to once per benchmark
+    let k2_facet = flippable_k2_facet_4d(&base_dt);
     group.bench_function("k2_roundtrip", |b| {
         b.iter_batched(
-            || {
-                let dt = base_dt.clone();
-                let facet = flippable_k2_facet_4d(&dt);
-                (dt, facet)
-            },
+            || (base_dt.clone(), k2_facet),
             |(mut dt, facet)| {
                 roundtrip_k2_4d(&mut dt, facet);
                 black_box(dt);
             },
             BatchSize::LargeInput,
         );
     });

+    let k3_ridge = flippable_k3_ridge_4d(&base_dt);
     group.bench_function("k3_roundtrip", |b| {
         b.iter_batched(
-            || {
-                let dt = base_dt.clone();
-                let ridge = flippable_k3_ridge_4d(&dt);
-                (dt, ridge)
-            },
+            || (base_dt.clone(), k3_ridge),
             |(mut dt, ridge)| {
                 roundtrip_k3_4d(&mut dt, ridge);
                 black_box(dt);
             },
             BatchSize::LargeInput,
         );
     });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benches/ci_performance_suite.rs` around lines 1185 - 1213, The benchmark
currently calls flippable_k2_facet_4d and flippable_k3_ridge_4d inside the
iter_batched setup closure, causing a full scan and dt.clone() per batch; hoist
the discovery out so you compute the candidate handles once before each
group.bench_function call (using base_dt to find the FacetHandle / RidgeHandle)
and then pass those handles into iter_batched, leaving only the mutable dt and
the trial flip in the per-batch closure; update the "k2_roundtrip" and
"k3_roundtrip" benchmarks to use the precomputed handles instead of calling
flippable_k2_facet_4d / flippable_k3_ridge_4d inside the closure so discovery
runs once per benchmark.
.github/workflows/profiling-benchmarks.yml (1)

316-350: Same metadata block, duplicated.

The comprehensive-profiling and memory-stress-test jobs now carry near-identical "Capture profiling environment metadata" steps. If you tweak the format or add a field (e.g. cargo bench --profile), you'll need to keep them in sync. Not blocking, but a small composite action or a shared shell script under scripts/ci/ would let you change the format in one place.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/profiling-benchmarks.yml around lines 316 - 350, The
metadata capture block duplicated between the comprehensive-profiling and
memory-stress-test jobs should be centralized: extract the shell sequence
currently in the "Capture memory profiling environment metadata" step into a
single reusable script (e.g. scripts/ci/capture_profiling_metadata.sh) or a
composite GitHub Action, make it executable, and then replace each job's inline
step with a single run calling that script (or uses: ./path/to/composite).
Update references in the comprehensive-profiling and memory-stress-test job
steps so they invoke the new reusable unit and ensure it writes
profiling-results/environment_metadata.md as before.
scripts/benchmark_utils.py (1)

888-897: Use startswith for bistellar_flips prefix to be future-proof.

The manifest currently limits bistellar_flips benchmarks to 4D only, but all other performance groups (construction, boundary_facets, convex_hull, validation, incremental_insert) are dimension-agnostic. When the Rust suite adds bistellar_flips_3d or bistellar_flips_5d (aligning with the PR's stated intent to expand benchmark coverage), the current exact match == "bistellar_flips_4d" will fail to recognize them. The results will silently drop with no warning. Change to startswith("bistellar_flips") to keep pace with the manifest without manual sync.

Proposed change
     if first_path_part.startswith("tds_new_"):
         return "construction"
-    if first_path_part == "bistellar_flips_4d":
+    if first_path_part.startswith("bistellar_flips"):
         return "bistellar_flips"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/benchmark_utils.py` around lines 888 - 897, The _ci_suite_group_key
static method currently checks first_path_part == "bistellar_flips_4d", which
will miss future variants like "bistellar_flips_3d"; change that equality to use
first_path_part.startswith("bistellar_flips") so any dimensional suffix maps to
the "bistellar_flips" group, keeping behavior consistent with
CI_PERFORMANCE_SUITE_GROUPS and the other prefix-based checks.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@benches/ci_performance_suite.rs`:
- Around line 265-279: prepare_inserts currently derives the same seed for both
Dataset::WellConditioned and Dataset::Adversarial which makes adversarial
inserts share RNG state with well-conditioned ones; change the seed derivation
inside prepare_inserts to be dataset-aware (e.g., compute seed =
dim_seed.wrapping_add(0x5151_5151) and then for Dataset::Adversarial apply an
extra twist such as seed ^= 0xA5A5_A5A5) before calling generate_adv_points or
generate_random_points_seeded so that generate_adv_points::<D>(count, seed) and
generate_random_points_seeded::<f64, D>(..., seed) receive distinct,
reproducible seeds while keeping prepare_inserts, generate_adv_points, and
generate_random_points_seeded signatures unchanged.
- Around line 90-133: The hardcoded benchmark_ids in API_BENCHMARK_ENTRIES
(field benchmark_ids on ApiBenchmarkEntry entries like group "construction",
"boundary_facets", etc.) can drift from runtime constants (e.g. COUNTS,
COUNTS_5D and any insert/operation count constants); replace the brittle string
literals by building the benchmark id strings at startup from those constants
(or add a debug_assert in the module that computes expected strings from
COUNTS/COUNTS_5D and compares them against the hardcoded benchmark_ids) so the
manifest always reflects the actual runtime values; locate API_BENCHMARK_ENTRIES
and either convert it to a function that returns Vec<ApiBenchmarkEntry>
constructed from the constants or add a validation helper that runs at init to
assert equality.

In `@benches/profiling_suite.rs`:
- Around line 743-797: Add adversarial benchmark companions by invoking the
existing benchmark_validation_components_dimension! macro to create
`_adversarial` variants: call benchmark_validation_components_dimension!(2,
benchmark_validation_components_2d_adversarial, <count>); and likewise for
3D/4D/5D (use the same or slightly larger counts as needed). Inside those runs,
switch the point generator to produce adversarial/near-boundary inputs (e.g.,
use the adversarial distribution variant of gen_points, such as
PointDistribution::Adversarial or the project’s equivalent) so the macro will
build triangulations from adversarial vertices and register groups named like
"validation_components_2d_adversarial" (the macro already formats group names
from $dim, so keep the suffix by naming the function accordingly). Ensure each
adversarial invocation mirrors the existing benchmark names
(`benchmark_validation_components_<Nd>`) and expectations so the new benches
exercise degenerate/near-boundary cases alongside the current random ones.

In `@justfile`:
- Around line 425-438: The trap for cleanup must be installed before running git
worktree add so a failed worktree add doesn't leak tmp_parent; after creating
tmp_parent and setting workdir, set cleanup_worktree=1 and immediately register
trap cleanup EXIT (i.e., move the trap registration to just after
tmp_parent="$(mktemp -d ...)" / workdir="$tmp_parent/worktree" and before git
worktree add), and ensure the cleanup() function references workdir/tmp_parent
as it does now; keep the existing git worktree remove and rm -rf logic
unchanged.

---

Outside diff comments:
In `@benches/profiling_suite.rs`:
- Around line 530-545: The measured closure currently includes
actual_point_counts.push(points.len()), which causes benchmark allocations to be
attributed to the triangulation; to fix, compute and store the points length
outside the measure closure (e.g., let points = gen_points::<D>(...); let
pts_len = points.len(); then call measure with the triangulation work using
points/vertices) and then push pts_len into actual_point_counts after measure
returns; update the block around gen_points, vertex! mapping,
DelaunayTriangulationBuilder::new(...).build::<()>() and allocation_infos
handling so only the triangulation work is measured and bookkeeping
(actual_point_counts.push) occurs outside the measured closure.

---

Nitpick comments:
In @.github/workflows/profiling-benchmarks.yml:
- Around line 316-350: The metadata capture block duplicated between the
comprehensive-profiling and memory-stress-test jobs should be centralized:
extract the shell sequence currently in the "Capture memory profiling
environment metadata" step into a single reusable script (e.g.
scripts/ci/capture_profiling_metadata.sh) or a composite GitHub Action, make it
executable, and then replace each job's inline step with a single run calling
that script (or uses: ./path/to/composite). Update references in the
comprehensive-profiling and memory-stress-test job steps so they invoke the new
reusable unit and ensure it writes profiling-results/environment_metadata.md as
before.

In `@benches/ci_performance_suite.rs`:
- Around line 412-432: roundtrip_k1_4d currently picks an arbitrary cell via
dt.cells().next(), which can pick a degenerate/sliver cell under different map
orders; instead, before the timed iterations compute a deterministic,
non-degenerate CellKey (e.g. iterate all cells on the cloned base triangulation
and pick the one with the largest volume using cell_centroid_4d/volume helper)
and pass that CellKey into the benchmark closure so roundtrip_k1_4d uses that
precomputed CellKey for flip_k1_insert and flip_k1_remove; modify the setup to
produce the chosen CellKey from FlipTriangulation4 and update roundtrip_k1_4d
signature to accept a CellKey (and use dt.tds().vertex_key_from_uuid/new_uuid
logic unchanged).
- Around line 309-338: prepare_adv_data currently brute-forces seeds with
seed_search_limit() and will panic if none found, but lacks the documented
discover-seeds escape hatch; update prepare_adv_data to (1) consult a new
KNOWN_ADV_SEEDS table (map of dimension+count -> seed) before searching and
return the stored seed/points/vertices if present, and (2) honor the existing
DELAUNAY_BENCH_DISCOVER_SEEDS / DELAUNAY_BENCH_DISCOVER_SEEDS_LIMIT workflow by
checking the env var at runtime: when DELAUNAY_BENCH_DISCOVER_SEEDS is set,
continue the search as now but on a successful candidate from
BenchTriangulation::new_with_options emit a machine-readable line (e.g.
"ADV_SEED D count seed") to stdout or the logger and return instead of panicking
so CI can record the adversarial seed; ensure you still use seed_search_limit()
for the limit and reference prepare_adv_data,
BenchTriangulation::new_with_options, seed_search_limit, and the new
KNOWN_ADV_SEEDS lookup when making changes.
- Around line 1185-1213: The benchmark currently calls flippable_k2_facet_4d and
flippable_k3_ridge_4d inside the iter_batched setup closure, causing a full scan
and dt.clone() per batch; hoist the discovery out so you compute the candidate
handles once before each group.bench_function call (using base_dt to find the
FacetHandle / RidgeHandle) and then pass those handles into iter_batched,
leaving only the mutable dt and the trial flip in the per-batch closure; update
the "k2_roundtrip" and "k3_roundtrip" benchmarks to use the precomputed handles
instead of calling flippable_k2_facet_4d / flippable_k3_ridge_4d inside the
closure so discovery runs once per benchmark.

In `@scripts/benchmark_utils.py`:
- Around line 888-897: The _ci_suite_group_key static method currently checks
first_path_part == "bistellar_flips_4d", which will miss future variants like
"bistellar_flips_3d"; change that equality to use
first_path_part.startswith("bistellar_flips") so any dimensional suffix maps to
the "bistellar_flips" group, keeping behavior consistent with
CI_PERFORMANCE_SUITE_GROUPS and the other prefix-based checks.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 7bea92c5-4a15-4958-9105-8f3d754e113d

📥 Commits

Reviewing files that changed from the base of the PR and between 5f3e029 and a0ab0fe.

📒 Files selected for processing (16)
  • .github/workflows/profiling-benchmarks.yml
  • Cargo.toml
  • benches/PERFORMANCE_RESULTS.md
  • benches/README.md
  • benches/ci_performance_suite.rs
  • benches/microbenchmarks.rs
  • benches/profiling_suite.rs
  • docs/code_organization.md
  • examples/convex_hull_3d_100_points.rs
  • examples/pachner_roundtrip_4d.rs
  • justfile
  • scripts/README.md
  • scripts/benchmark_utils.py
  • scripts/tests/test_benchmark_utils.py
  • src/lib.rs
  • tests/prelude_exports.rs
💤 Files with no reviewable changes (3)
  • docs/code_organization.md
  • Cargo.toml
  • benches/microbenchmarks.rs
✅ Files skipped from review due to trivial changes (3)
  • examples/pachner_roundtrip_4d.rs
  • scripts/README.md
  • benches/PERFORMANCE_RESULTS.md

Comment thread benches/ci_performance_suite.rs Outdated
Comment thread benches/ci_performance_suite.rs
Comment thread benches/profiling_suite.rs
Comment thread justfile Outdated
…uration

- The `ci_performance_suite` now dynamically generates benchmark IDs,
  reducing maintenance overhead and improving consistency. It also leverages
  pre-computed adversarial seeds for stable and reproducible performance
  measurements across various operations and dimensions.
- The `profiling_suite` gains support for adversarial point distributions
  in validation benchmarks, using a robust seed search to ensure valid
  triangulations for complex cases.
- Profiling environment metadata capture is refactored into a reusable
  script, simplifying workflow definitions in GitHub Actions.
- The `justfile` separates `SlotMap` backend linting into a dedicated
  `just check-storage-backends` command, providing a focused compatibility
  canary.
- The `just profile` command's cleanup logic is made more robust, ensuring
  temporary worktree removal in all scenarios.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
benches/profiling_suite.rs (1)

488-492: ⚠️ Potential issue | 🟡 Minor

Configured percentile is not reflected in the summary label.

You made percentile configurable here, but Line 523 still prints a hardcoded “95th percentile” label, which becomes incorrect when BENCH_PERCENTILE is overridden.

💡 Minimal fix
-    println!(
-        "95th percentile bytes: {} ({:.2} MB)",
-        percentile_95,
-        percentile_95 as f64 / (1024.0 * 1024.0)
-    );
+    println!(
+        "Configured percentile bytes: {} ({:.2} MB)",
+        percentile_95,
+        percentile_95 as f64 / (1024.0 * 1024.0)
+    );
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benches/profiling_suite.rs` around lines 488 - 492, The summary label still
uses a hardcoded “95th percentile” string even though you parse BENCH_PERCENTILE
into the percentile variable; update the print/format call that emits the
summary label (where the "95th percentile" text is generated) to use the
percentile variable instead, formatting it as an ordinal (e.g., "{percentile}th
percentile" or compute "st/nd/rd/th" if desired) so the label reflects the
configured value from percentile; ensure any tests or expected strings are
adjusted accordingly.
benches/ci_performance_suite.rs (1)

304-309: ⚠️ Potential issue | 🟡 Minor

Gate benchmark diagnostics with bench-logging and keep measured closures log-free.

Current diagnostics are runtime-env gated, not feature gated, and error! is still reachable inside b.iter closures. Please gate benchmark diagnostics with #[cfg(feature = "bench-logging")] and remove logging calls from measured closures.

💡 Suggested direction
-fn bench_logging_enabled() -> bool {
-    env::var("DELAUNAY_BENCH_LOG").is_ok_and(|value| value != "0")
-}
+#[cfg(feature = "bench-logging")]
+fn bench_logging_enabled() -> bool {
+    env::var("DELAUNAY_BENCH_LOG").is_ok_and(|value| value != "0")
+}
+
+#[cfg(not(feature = "bench-logging"))]
+const fn bench_logging_enabled() -> bool {
+    false
+}
-        warn!( ... );
+        if bench_logging_enabled() {
+            warn!( ... );
+        }
-                                if bench_logging_enabled() {
-                                    error!( ... );
-                                }
                                 panic!( ... );

As per coding guidelines, "Gate non-essential test/benchmark diagnostics behind feature flags using test-debug for test diagnostics and bench-logging for benchmark diagnostics in Rust code" and "Never log inside hot benchmark loops or Criterion-measured closures; emit setup/summary diagnostics outside the measured path instead in Rust benchmarks".

Also applies to: 427-432, 701-707, 839-848, 889-897

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benches/ci_performance_suite.rs` around lines 304 - 309, Gate all benchmark
diagnostic logging behind the "bench-logging" feature and remove logs from
measured closures: wrap the diagnostic blocks that call warn!, error!, or
similar (the warn! in the shown snippet and the other occurrences at the
reported ranges) with #[cfg(feature = "bench-logging")] (or put the logging in a
small helper function annotated with that attribute) and move any logging that
currently lives inside Criterion-measured closures (the closures passed to
b.iter) to run once before or after b.iter; ensure the code inside b.iter
contains no calls to warn!/error! or other logging macros.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/benchmark_utils.py`:
- Around line 900-905: The _ci_suite_dimension function's regex doesn't match
dimensions preceded by a slash (e.g., "boundary_facets/2d/1000") because the
leading anchor only allows start or underscore; update the pattern to accept
slash as a valid prefix by changing the leading non-capturing group to include
"/" (so it becomes (?:^|_|/)), keep the rest of the pattern (capture (\d+)d and
allow trailing _ or / or end), and ensure the function still returns
f"{match.group(1)}D" or "n/a" as before; search for _ci_suite_dimension and
benchmark_id to locate the change.
- Around line 492-494: Narrow the broad except in the ci_performance_suite
runner by catching the specific errors that run_cargo_command() can raise:
replace the generic "except Exception as e:" around the run_cargo_command() call
with "except ExecutableNotFoundError as e:", "except subprocess.TimeoutExpired
as e:", and "except OSError as e:" (do not add subprocess.CalledProcessError
since run_cargo_command() isn't called with check=True); also add "from
subprocess_utils import ExecutableNotFoundError" to the imports at the top of
the file and keep the existing error logging/return False behavior for each case
so failures remain visible.

---

Outside diff comments:
In `@benches/ci_performance_suite.rs`:
- Around line 304-309: Gate all benchmark diagnostic logging behind the
"bench-logging" feature and remove logs from measured closures: wrap the
diagnostic blocks that call warn!, error!, or similar (the warn! in the shown
snippet and the other occurrences at the reported ranges) with #[cfg(feature =
"bench-logging")] (or put the logging in a small helper function annotated with
that attribute) and move any logging that currently lives inside
Criterion-measured closures (the closures passed to b.iter) to run once before
or after b.iter; ensure the code inside b.iter contains no calls to warn!/error!
or other logging macros.

In `@benches/profiling_suite.rs`:
- Around line 488-492: The summary label still uses a hardcoded “95th
percentile” string even though you parse BENCH_PERCENTILE into the percentile
variable; update the print/format call that emits the summary label (where the
"95th percentile" text is generated) to use the percentile variable instead,
formatting it as an ordinal (e.g., "{percentile}th percentile" or compute
"st/nd/rd/th" if desired) so the label reflects the configured value from
percentile; ensure any tests or expected strings are adjusted accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 951fdaa4-7c60-4b04-bc0c-23f45fe64b2b

📥 Commits

Reviewing files that changed from the base of the PR and between a0ab0fe and dad0588.

📒 Files selected for processing (7)
  • .github/workflows/profiling-benchmarks.yml
  • benches/ci_performance_suite.rs
  • benches/profiling_suite.rs
  • docs/dev/commands.md
  • justfile
  • scripts/benchmark_utils.py
  • scripts/ci/capture_profiling_metadata.sh
✅ Files skipped from review due to trivial changes (1)
  • docs/dev/commands.md

Comment thread scripts/benchmark_utils.py Outdated
Comment thread scripts/benchmark_utils.py Outdated
- Upgrade `uv` Python package manager to version `0.11.8` across all
  GitHub Actions workflows for consistent environments.
- Make internal benchmark logging conditional on the `bench-logging` feature
  in `ci_performance_suite`, reducing default log verbosity.
- Enhance memory profiling in `profiling_suite` with configurable percentile
  calculation for allocation summaries, improving analysis flexibility.
- Improve error handling and reporting in Python benchmark execution scripts
  for greater reliability and clearer failure feedback.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
benches/profiling_suite.rs (1)

129-140: ⚠️ Potential issue | 🟠 Major

Don't print fake allocation statistics in the default bench configuration.

When count-allocations is off, measure() returns zeros, print_alloc_banner_once() stays silent unless bench-logging is also on, and the suite still emits the full memory summary. That makes the default memory_profiling output look real even though it's placeholder data. Please either skip this summary when allocation counting is disabled, or make the placeholder status explicit without depending on bench-logging.

As per coding guidelines, "Gate non-essential test/benchmark diagnostics behind feature flags using test-debug for test diagnostics and bench-logging for benchmark diagnostics in Rust code".

Also applies to: 520-553, 652-655

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benches/profiling_suite.rs` around lines 129 - 140, The benchmark prints a
full memory summary with fake zeros when the feature "count-allocations" is
disabled; update the code so when count-allocations is not enabled (i.e., in the
#[cfg(not(feature = "count-allocations"))] branch) you either skip emitting the
memory summary entirely or emit an explicit, unconditional placeholder message
(not gated on bench-logging) so consumers know the stats are not real. Locate
the measure function and the print_alloc_banner_once function and change the
summary-emission call sites (where AllocationInfo::default() results are
printed) to check cfg!(feature = "count-allocations") and if false either return
early (no summary) or call
print_alloc_banner_once_unconditional/print_placeholder_message to log a clear
"allocation stats unavailable: count-allocations feature disabled" message;
ensure the gating follows the guideline to not rely on bench-logging for this
placeholder behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/benchmark_utils.py`:
- Around line 104-131: The new CI suite IDs defined in
CI_PERFORMANCE_SUITE_GROUPS (and its order CI_PERFORMANCE_SUITE_GROUP_ORDER) are
only used for markdown, so update the baseline/regression pipeline by extending
CriterionParser.find_criterion_results to emit and compare the expanded suite
IDs (e.g., "boundary_facets/boundary_facets_3d_adversarial/50") instead of
collapsing to the old "{points}_{dimension}" keys; specifically, map parsed
markdown/group names from CI_PERFORMANCE_SUITE_GROUPS into the baseline key
generation and comparison logic inside find_criterion_results (and any baseline
serialization/deserialization paths it uses) so baseline generation and
regression comparisons use the same expanded identifiers and preserve distinct
groups per dimension/input.

---

Outside diff comments:
In `@benches/profiling_suite.rs`:
- Around line 129-140: The benchmark prints a full memory summary with fake
zeros when the feature "count-allocations" is disabled; update the code so when
count-allocations is not enabled (i.e., in the #[cfg(not(feature =
"count-allocations"))] branch) you either skip emitting the memory summary
entirely or emit an explicit, unconditional placeholder message (not gated on
bench-logging) so consumers know the stats are not real. Locate the measure
function and the print_alloc_banner_once function and change the
summary-emission call sites (where AllocationInfo::default() results are
printed) to check cfg!(feature = "count-allocations") and if false either return
early (no summary) or call
print_alloc_banner_once_unconditional/print_placeholder_message to log a clear
"allocation stats unavailable: count-allocations feature disabled" message;
ensure the gating follows the guideline to not rely on bench-logging for this
placeholder behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: a3fe1711-4067-48fc-ad72-13f2fd3bd02f

📥 Commits

Reviewing files that changed from the base of the PR and between dad0588 and bf8f985.

📒 Files selected for processing (7)
  • .github/workflows/benchmarks.yml
  • .github/workflows/ci.yml
  • .github/workflows/generate-baseline.yml
  • benches/ci_performance_suite.rs
  • benches/profiling_suite.rs
  • scripts/benchmark_utils.py
  • scripts/tests/test_benchmark_utils.py
✅ Files skipped from review due to trivial changes (3)
  • .github/workflows/ci.yml
  • .github/workflows/benchmarks.yml
  • .github/workflows/generate-baseline.yml
🚧 Files skipped from review as they are similar to previous changes (1)
  • benches/ci_performance_suite.rs

Comment thread scripts/benchmark_utils.py
Adopt explicit `benchmark_id` fields in Python scripts to enable
more robust identification, storage, and comparison of benchmark
results. This is critical for handling complex benchmark suites,
such as `ci_performance_suite`, which may not always have a simple
`points` based dimension, or which may have multiple benchmarks
for the same points/dimension combination. Baseline files and
markdown reports now support these expanded IDs.

Additionally, internal memory profiling in `profiling_suite.rs`
is made conditional on the `count-allocations` feature. When the
feature is not enabled, allocation tracking structures and logic
are omitted or reduced to no-ops, reducing compilation overhead
and clarifying the intent of memory statistics.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
scripts/benchmark_utils.py (1)

1573-1616: ⚠️ Potential issue | 🟠 Major

Don't serialize non-size CI cases as 0 Points.

When points is None for the new non-numeric IDs (for example the bistellar-flip .../roundtrip cases), Line 1602 coerces it to 0. That leaks into baseline/comparison output as fake === 0 Points (...) === entries and then gets sorted/rendered like a real workload. This needs an explicit non-numeric input field, or points needs to stay optional end-to-end, instead of using a sentinel zero.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/benchmark_utils.py` around lines 1573 - 1616, The code currently
coerces a missing points value to 0 via BenchmarkData(points or 0, dimension)
which creates bogus "0 Points" workloads; update parse_estimates_json to
preserve points being None by constructing the BenchmarkData with the raw
optional points (i.e., BenchmarkData(points, dimension)) instead of coercing to
0, and ensure the BenchmarkData class/constructor (and any serialization logic)
accepts Optional[int] for its size field and renders a non-numeric/empty label
when points is None; keep the existing conditional throughput block (it already
guards on points is not None) and only compute/attach throughput when points is
present.
benches/profiling_suite.rs (1)

969-980: ⚠️ Potential issue | 🟠 Major

Clamp BENCH_SAMPLE_SIZE to the Criterion minimum before passing it.

The example configuration shows BENCH_SAMPLE_SIZE=5, but Criterion 0.8.2 requires sample_size ≥ 10 and panics if passed a smaller value. Users following the documented example will encounter a runtime panic before the benchmark starts.

Suggested fix
-        let sample_size = env::var("BENCH_SAMPLE_SIZE")
+        let sample_size = env::var("BENCH_SAMPLE_SIZE")
             .ok()
             .and_then(|v| v.parse().ok())
+            .map(|v: usize| v.max(10))
             .unwrap_or(10);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benches/profiling_suite.rs` around lines 969 - 980, The BENCH_SAMPLE_SIZE
environment value is passed directly into
Criterion::default().sample_size(sample_size) but Criterion 0.8.2 requires a
minimum of 10, causing a panic for smaller values; update the logic that
computes sample_size (the let sample_size = ... binding) to clamp or max the
parsed value to 10 before calling Criterion::default().sample_size(...), e.g.
parse the env var as currently done and then replace the unwrap_or with
something that ensures sample_size >= 10 so Criterion::sample_size always
receives at least 10.
🧹 Nitpick comments (3)
scripts/benchmark_utils.py (1)

1621-1639: Centralize the CI-suite ID parsing helpers.

These _ci_suite_group_key / _ci_suite_dimension helpers now exist in both PerformanceSummaryGenerator and CriterionParser. That makes it easy for summary rendering and baseline parsing to drift on the same benchmark IDs. Please lift them to shared module-level helpers and reuse them from both call sites. As per coding guidelines, scripts/*.py: "Focus on code quality, maintainability, and adherence to Python best practices."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/benchmark_utils.py` around lines 1621 - 1639, The two methods
_ci_suite_group_key and _ci_suite_dimension should be extracted from
PerformanceSummaryGenerator into module-level functions (e.g.,
ci_suite_group_key and ci_suite_dimension) in scripts/benchmark_utils.py and the
duplicate copies removed from CriterionParser; update both call sites to call
the new module-level helpers (referencing the original method names
_ci_suite_group_key/_ci_suite_dimension for locating them and the classes
PerformanceSummaryGenerator and CriterionParser for where to change calls), keep
the same behavior and signatures (return types and regex) and add a short
docstring for each shared function to explain purpose.
scripts/tests/test_benchmark_utils.py (2)

2288-2292: Optional: centralize section-title literals used across markdown assertions.

You now assert multiple related headings in several tests; using shared constants would make title changes less brittle.

Also applies to: 2342-2342, 2836-2837

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/tests/test_benchmark_utils.py` around lines 2288 - 2292, Centralize
the repeated markdown heading literals by defining shared constants (e.g.,
PUBLIC_API_TITLE = "### Public API Performance Contract
(`ci_performance_suite`)", CIRCUMSPHERE_TITLE = "## Circumsphere Predicate
Analysis", PERFORMANCE_RANKING_TITLE = "### Performance Ranking",
RECOMMENDATIONS_TITLE = "### Recommendations", PERFORMANCE_UPDATES_TITLE = "##
Performance Data Updates") at the top of scripts/tests/test_benchmark_utils.py
and replace the inline string assertions that reference those exact literals
with the corresponding constants; also update the other occurrences noted (near
the other assertion locations) to use the same constants so future heading
changes only require updating one definition.

266-279: Consider extracting duplicated write_estimate test helper.

The same nested helper appears twice. Pulling it into a shared module-level helper/fixture would reduce maintenance drift.

Also applies to: 2299-2312

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/tests/test_benchmark_utils.py` around lines 266 - 279, Two identical
nested helpers named write_estimate are duplicated in the tests; extract that
logic into a single module-level helper or pytest fixture (e.g., a top-level
function write_estimate(target_dir, path_parts, mean_ns) or a fixture like
write_estimate_fixture) and replace both nested definitions with calls to the
common helper; update usages in the tests that currently call the nested
write_estimate (the copies shown and the one at lines ~2299-2312) so they import
or reference the new module-level function/fixture and keep the same behavior
(creating criterion/<path_parts>/base, writing estimates.json with
mean.point_estimate and confidence_interval).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@benches/profiling_suite.rs`:
- Around line 579-643: The allocation diagnostics (calls around
print_alloc_banner_once(), the #[cfg(feature = "count-allocations")] block that
computes avg_info using allocation_infos, actual_point_counts, bytes_max_values,
configured_percentile(), calculate_percentile(), and print_alloc_summary()) are
being executed inside the Criterion measured closure; move this entire
diagnostics emission out of b.iter_custom (or any Criterion-measured closure) so
it runs once during setup/teardown instead of per-sample, and gate it behind the
bench-logging feature (e.g., #[cfg(feature = "bench-logging")] around the moved
code) while keeping the computation helpers (configured_percentile,
calculate_percentile) accessible from the new location.

---

Outside diff comments:
In `@benches/profiling_suite.rs`:
- Around line 969-980: The BENCH_SAMPLE_SIZE environment value is passed
directly into Criterion::default().sample_size(sample_size) but Criterion 0.8.2
requires a minimum of 10, causing a panic for smaller values; update the logic
that computes sample_size (the let sample_size = ... binding) to clamp or max
the parsed value to 10 before calling Criterion::default().sample_size(...),
e.g. parse the env var as currently done and then replace the unwrap_or with
something that ensures sample_size >= 10 so Criterion::sample_size always
receives at least 10.

In `@scripts/benchmark_utils.py`:
- Around line 1573-1616: The code currently coerces a missing points value to 0
via BenchmarkData(points or 0, dimension) which creates bogus "0 Points"
workloads; update parse_estimates_json to preserve points being None by
constructing the BenchmarkData with the raw optional points (i.e.,
BenchmarkData(points, dimension)) instead of coercing to 0, and ensure the
BenchmarkData class/constructor (and any serialization logic) accepts
Optional[int] for its size field and renders a non-numeric/empty label when
points is None; keep the existing conditional throughput block (it already
guards on points is not None) and only compute/attach throughput when points is
present.

---

Nitpick comments:
In `@scripts/benchmark_utils.py`:
- Around line 1621-1639: The two methods _ci_suite_group_key and
_ci_suite_dimension should be extracted from PerformanceSummaryGenerator into
module-level functions (e.g., ci_suite_group_key and ci_suite_dimension) in
scripts/benchmark_utils.py and the duplicate copies removed from
CriterionParser; update both call sites to call the new module-level helpers
(referencing the original method names _ci_suite_group_key/_ci_suite_dimension
for locating them and the classes PerformanceSummaryGenerator and
CriterionParser for where to change calls), keep the same behavior and
signatures (return types and regex) and add a short docstring for each shared
function to explain purpose.

In `@scripts/tests/test_benchmark_utils.py`:
- Around line 2288-2292: Centralize the repeated markdown heading literals by
defining shared constants (e.g., PUBLIC_API_TITLE = "### Public API Performance
Contract (`ci_performance_suite`)", CIRCUMSPHERE_TITLE = "## Circumsphere
Predicate Analysis", PERFORMANCE_RANKING_TITLE = "### Performance Ranking",
RECOMMENDATIONS_TITLE = "### Recommendations", PERFORMANCE_UPDATES_TITLE = "##
Performance Data Updates") at the top of scripts/tests/test_benchmark_utils.py
and replace the inline string assertions that reference those exact literals
with the corresponding constants; also update the other occurrences noted (near
the other assertion locations) to use the same constants so future heading
changes only require updating one definition.
- Around line 266-279: Two identical nested helpers named write_estimate are
duplicated in the tests; extract that logic into a single module-level helper or
pytest fixture (e.g., a top-level function write_estimate(target_dir,
path_parts, mean_ns) or a fixture like write_estimate_fixture) and replace both
nested definitions with calls to the common helper; update usages in the tests
that currently call the nested write_estimate (the copies shown and the one at
lines ~2299-2312) so they import or reference the new module-level
function/fixture and keep the same behavior (creating
criterion/<path_parts>/base, writing estimates.json with mean.point_estimate and
confidence_interval).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 9c1b1ad5-8d86-4942-9eae-c1804923cba4

📥 Commits

Reviewing files that changed from the base of the PR and between bf8f985 and d06e2a6.

📒 Files selected for processing (5)
  • benches/profiling_suite.rs
  • scripts/benchmark_models.py
  • scripts/benchmark_utils.py
  • scripts/tests/test_benchmark_models.py
  • scripts/tests/test_benchmark_utils.py

Comment thread benches/profiling_suite.rs Outdated
…oling

Update benchmark scripts to correctly model and report workloads that
do not operate on a numeric input size (e.g., bistellar flips). This
involves allowing benchmark data to represent an absent point count
and adapting parsing, sorting, and reporting logic. Memory profiling
in `profiling_suite.rs` is also refined: allocation statistics are
now gated on both `count-allocations` and `bench-logging` features,
and reporting logic is restructured for clarity and robustness across
benchmark samples. Utility functions were extracted in Python.
@acgetchell
Copy link
Copy Markdown
Owner Author

@coderabbitai full review

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (1)
scripts/benchmark_utils.py (1)

183-188: Derive non-numeric labels from the ID instead of hard-coded defaults.

Right now every non-adversarial case becomes well-conditioned, and every non-numeric tail becomes roundtrip. That matches today's flip benchmarks by coincidence, but it will mislabel the next unsized or non-adversarial contract entry. Prefer parsing the relevant token from benchmark_id/path_parts and falling back to something neutral like n/a when no explicit variant exists.

Also applies to: 918-923

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@benches/profiling_suite.rs`:
- Around line 118-122: The print_alloc_banner_once function currently prints an
allocation banner unconditionally; gate that non-essential benchmark diagnostic
behind the bench-logging feature so it only prints when bench-logging is enabled
(e.g., surround the println! call or the call_once invocation with cfg(feature =
"bench-logging") or make the closure no-op unless the feature is enabled);
update the same pattern at the other occurrence referenced (lines ~655-656) so
both places only emit the banner when bench-logging is active, keeping the
Once/ONCE usage intact.

In `@docs/dev/commands.md`:
- Around line 106-112: The docs claim "`just check` validates the default
DenseSlotMap backend plus all feature combinations" which is inaccurate; update
the sentence in docs/dev/commands.md to state that "`just check` runs checks for
the default backend (DenseSlotMap) and also performs an all-features pass" (or
similar wording), and explicitly note that the justfile only runs clippy for the
default feature set and for --all-features; preserve the existing guidance to
run the legacy SlotMap backend explicitly with `just check-storage-backends`.

In `@scripts/benchmark_models.py`:
- Around line 29-33: The comparison_key property currently maps all unsized
benchmarks (points is None) in the same dimension to "unsized_<dimension>"
causing collisions; update comparison_key (the property method in the class) so
that if self.points is None it requires a real self.benchmark_id and raises a
clear exception (e.g., ValueError) when benchmark_id is missing, rather than
falling back to "unsized_<dimension>"; this forces callers to supply a unique
benchmark_id for unsized workloads and prevents silent overwrites in
baseline/comparison dicts.

In `@scripts/benchmark_utils.py`:
- Around line 942-1008: The parser _parse_ci_performance_suite_results currently
accepts any estimates.json under self.circumsphere_results_dir and can pick up
stale files; modify it (and the companion _process_ci_performance_suite_results)
to only consider benchmark IDs listed in the CI-suite manifest emitted by
ci_performance_suite.rs: capture the manifest from the benchmark stdout at
run-time and pass it (or store it under a known path) to the code that builds
estimates_by_id, then when iterating over
sorted(criterion_dir.glob("**/estimates.json")) skip any estimates_path whose
relative path parts (the benchmark_id formed by path_parts) are not present in
the manifest set before adding to estimates_by_id; ensure fallback behavior when
no manifest exists remains safe (e.g., return [] or keep current logic) and
update any callers that run the benchmark to write/forward the manifest to the
same results directory so _parse_ci_performance_suite_results and
_process_ci_performance_suite_results can use it.

In `@scripts/ci/capture_profiling_metadata.sh`:
- Around line 35-44: The script echoes GITHUB_REF_NAME and RUNNER_OS without
defaults which will fail under set -u when not running in GitHub Actions; update
the echo lines to use parameter expansion defaults (e.g. replace
${GITHUB_REF_NAME} with ${GITHUB_REF_NAME:-local} and ${RUNNER_OS} with
${RUNNER_OS:-$(uname -s)}) so the script runs locally and in CI without error,
keeping the rest of the variable names (declared_toolchain, rust_version,
benchmark_filter, profiling_mode) unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 47b7fb3e-a41e-4c6c-8ff1-b66ab13ef144

📥 Commits

Reviewing files that changed from the base of the PR and between 6e437fb and a88f788.

📒 Files selected for processing (23)
  • .github/workflows/benchmarks.yml
  • .github/workflows/ci.yml
  • .github/workflows/generate-baseline.yml
  • .github/workflows/profiling-benchmarks.yml
  • Cargo.toml
  • benches/PERFORMANCE_RESULTS.md
  • benches/README.md
  • benches/ci_performance_suite.rs
  • benches/microbenchmarks.rs
  • benches/profiling_suite.rs
  • docs/code_organization.md
  • docs/dev/commands.md
  • examples/convex_hull_3d_100_points.rs
  • examples/pachner_roundtrip_4d.rs
  • justfile
  • scripts/README.md
  • scripts/benchmark_models.py
  • scripts/benchmark_utils.py
  • scripts/ci/capture_profiling_metadata.sh
  • scripts/tests/test_benchmark_models.py
  • scripts/tests/test_benchmark_utils.py
  • src/lib.rs
  • tests/prelude_exports.rs
💤 Files with no reviewable changes (3)
  • docs/code_organization.md
  • Cargo.toml
  • benches/microbenchmarks.rs

Comment thread benches/profiling_suite.rs
Comment thread docs/dev/commands.md Outdated
Comment thread scripts/benchmark_models.py Outdated
Comment thread scripts/benchmark_models.py
Comment thread scripts/benchmark_utils.py
Comment thread scripts/ci/capture_profiling_metadata.sh Outdated
Introduces a manifest system to accurately track and filter benchmark
results from the `ci_performance_suite`. This prevents incorrect
performance comparisons due to stale Criterion output directories.

Additionally, this change enforces explicit benchmark IDs for unsized
workloads to ensure stable comparison keys, improves local profiling
metadata capture, and refines conditional allocation logging.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
scripts/benchmark_utils.py (1)

185-199: ⚠️ Potential issue | 🟠 Major

Fail closed when a successful CI-suite run does not yield a manifest.

If _parse_ci_performance_manifest_ids(stdout) returns nothing here, the sidecar file is removed and both CI parsers later fall back to scanning every matching estimates.json. In compare_with_baseline() and dev baseline generation, that reintroduces the stale-results problem this manifest was added to prevent. Please make manifest extraction a required postcondition for successful ci_performance_suite runs instead of silently dropping back to unfiltered discovery.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/benchmark_utils.py` around lines 185 - 199, In
_write_ci_performance_manifest_ids ensure manifest extraction is a required
postcondition for successful ci_performance_suite runs: when
_parse_ci_performance_manifest_ids(stdout) returns empty, do not unlink the
manifest or silently fall back; instead raise a clear exception (e.g.,
RuntimeError) that includes context (stdout and that ci_performance_suite ran)
so callers like compare_with_baseline and dev baseline generation can fail fast;
keep writing behavior when manifest_ids is present and still create
criterion_dir and write the sorted manifest_ids as before.
🧹 Nitpick comments (2)
benches/profiling_suite.rs (1)

48-48: Documentation could clarify the minimum sample size.

Line 974 clamps BENCH_SAMPLE_SIZE to a minimum of 10 via size.max(10), but the documentation at line 48 only states "(default: 10)" without mentioning this floor. Users setting BENCH_SAMPLE_SIZE=5 would get 10 samples, which could be unexpected.

📝 Suggested documentation update
-//! - `BENCH_SAMPLE_SIZE`: Override Criterion sample size (default: 10)
+//! - `BENCH_SAMPLE_SIZE`: Override Criterion sample size (default: 10, minimum: 10)

Also applies to: 971-974

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benches/profiling_suite.rs` at line 48, The docs mention BENCH_SAMPLE_SIZE
defaulting to 10 but omit that the code enforces a minimum; update the
comment/docstring around BENCH_SAMPLE_SIZE to state that values below 10 are
clamped to 10 (the clamp is implemented via size.max(10) in the profiling
logic), so clarify that setting BENCH_SAMPLE_SIZE=5 will result in 10 samples
and reference the clamping behavior in the text.
scripts/benchmark_utils.py (1)

1005-1076: Deduplicate the CI-suite Criterion discovery path.

_parse_ci_performance_suite_results() and CriterionParser._process_ci_performance_suite_results() now carry near-identical manifest filtering, base/new preference, ID reconstruction, and sorting logic. This is exactly the kind of code that will drift the next time the manifest or ID format changes. Pulling the shared walk/filter step into one helper would make future fixes much safer.

Also applies to: 1697-1748

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/benchmark_utils.py` around lines 1005 - 1076, The two functions
_parse_ci_performance_suite_results and
CriterionParser._process_ci_performance_suite_results duplicate the Criterion
discovery/walk, manifest filtering, base/new preference, ID reconstruction and
group filtering; extract that shared logic into a helper (e.g.,
_collect_ci_suite_estimates or similar) which takes the criterion_dir and
manifest_ids (using _load_ci_performance_manifest_ids) and returns a
deduplicated list of (path_parts, estimates_path) or a dict equivalent after
applying ci_suite_group_key and choosing "new" over "base"; then have both
_parse_ci_performance_suite_results and
CriterionParser._process_ci_performance_suite_results call this helper and keep
only their result-parsing and sorting (use CI_PERFORMANCE_SUITE_GROUP_ORDER and
the existing dimension/input_size parsing in each caller).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/benchmark_utils.py`:
- Around line 545-561: The helper _run_ci_performance_suite currently always
appends DEV_MODE_BENCH_ARGS making trusted-profile runs still use dev sampling;
change the function signature to accept an explicit boolean (e.g., use_dev_mode:
bool = False) or a sampling_mode flag, and build cargo_args so that
DEV_MODE_BENCH_ARGS are only extended when use_dev_mode is true; keep
TRUSTED_BENCH_PROFILE logic as-is but omit DEV_MODE_BENCH_ARGS by default so
full/trusted runs produce proper full-sampling Criterion data (update callers to
pass True where dev mode is intended).

---

Duplicate comments:
In `@scripts/benchmark_utils.py`:
- Around line 185-199: In _write_ci_performance_manifest_ids ensure manifest
extraction is a required postcondition for successful ci_performance_suite runs:
when _parse_ci_performance_manifest_ids(stdout) returns empty, do not unlink the
manifest or silently fall back; instead raise a clear exception (e.g.,
RuntimeError) that includes context (stdout and that ci_performance_suite ran)
so callers like compare_with_baseline and dev baseline generation can fail fast;
keep writing behavior when manifest_ids is present and still create
criterion_dir and write the sorted manifest_ids as before.

---

Nitpick comments:
In `@benches/profiling_suite.rs`:
- Line 48: The docs mention BENCH_SAMPLE_SIZE defaulting to 10 but omit that the
code enforces a minimum; update the comment/docstring around BENCH_SAMPLE_SIZE
to state that values below 10 are clamped to 10 (the clamp is implemented via
size.max(10) in the profiling logic), so clarify that setting
BENCH_SAMPLE_SIZE=5 will result in 10 samples and reference the clamping
behavior in the text.

In `@scripts/benchmark_utils.py`:
- Around line 1005-1076: The two functions _parse_ci_performance_suite_results
and CriterionParser._process_ci_performance_suite_results duplicate the
Criterion discovery/walk, manifest filtering, base/new preference, ID
reconstruction and group filtering; extract that shared logic into a helper
(e.g., _collect_ci_suite_estimates or similar) which takes the criterion_dir and
manifest_ids (using _load_ci_performance_manifest_ids) and returns a
deduplicated list of (path_parts, estimates_path) or a dict equivalent after
applying ci_suite_group_key and choosing "new" over "base"; then have both
_parse_ci_performance_suite_results and
CriterionParser._process_ci_performance_suite_results call this helper and keep
only their result-parsing and sorting (use CI_PERFORMANCE_SUITE_GROUP_ORDER and
the existing dimension/input_size parsing in each caller).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 83e0a23f-1615-4ae0-abb6-9abdcd6ff6dd

📥 Commits

Reviewing files that changed from the base of the PR and between bf8f985 and c301957.

📒 Files selected for processing (7)
  • benches/profiling_suite.rs
  • docs/dev/commands.md
  • scripts/benchmark_models.py
  • scripts/benchmark_utils.py
  • scripts/ci/capture_profiling_metadata.sh
  • scripts/tests/test_benchmark_models.py
  • scripts/tests/test_benchmark_utils.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • docs/dev/commands.md
  • scripts/tests/test_benchmark_utils.py

Comment thread scripts/benchmark_utils.py Outdated
This commit introduces comprehensive improvements to the Python scripts for
benchmarking, profiling, and hardware analysis. Key changes focus on improving
code robustness, type safety, and overall maintainability.

- **New Development Guidelines:** Adds `docs/dev/python.md` to formalize best
  practices for Python automation, guiding developers on rigorous typing,
  structured subprocess mocking, and precise exception handling.
- **Stricter Linting and Typing:** Updates `pyproject.toml` to enable additional
  Ruff rules for type annotation coverage (`ANN201`, `ANN202`, `ANN204`) and
  re-enable `BLE001` (blind exception catch), applying these quality checks
  across all Python scripts.
- **Robust Exception Handling:** Replaces broad `except Exception` clauses with
  specific, recoverable error families (`_RECOVERABLE_CLI_ERRORS`) within
  core benchmark, hardware, and comparison utilities, preventing silent
  failures and improving debuggability.
- **Typed Subprocess Mocks:** Updates test fixtures to return fully typed
  `subprocess.CompletedProcess` objects, aligning with new mocking guidance
  and enhancing test reliability for command-line interactions.
- **Internal Refactorings:** Extracts and clarifies helper functions in
  `benchmark_utils.py`, `hardware_utils.py`, and `postprocess_changelog.py`
  to improve code readability and reusability.
- **Dev Mode Benchmark Options:** Enables `ci_performance_suite` to
  conditionally apply reduced Criterion sampling arguments for faster local
  development feedback.
- **Benchmark Config Clarification:** Adds a note to `profiling_suite.rs`
  clarifying Criterion's `BENCH_SAMPLE_SIZE` clamping behavior.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
scripts/compare_storage_backends.py (1)

47-55: Keep KeyError and TypeError out of the top-level recoverable set.

Those two now get swallowed by run_comparison(), _run_benchmark(), and main(), so a broken result shape or report-generation bug is downgraded to a generic CLI failure instead of surfacing the actual programming error. The JSON parsing path already handles malformed data locally, which is the right place for these if you really need them.

Suggested fix
 _RECOVERABLE_COMPARISON_ERRORS: tuple[type[BaseException], ...] = (
     ExecutableNotFoundError,
     OSError,
     RuntimeError,
-    TypeError,
     ValueError,
-    KeyError,
     subprocess.SubprocessError,
 )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/compare_storage_backends.py` around lines 47 - 55, The top-level
recoverable set _RECOVERABLE_COMPARISON_ERRORS currently includes KeyError and
TypeError which causes programming errors to be swallowed inside run_comparison,
_run_benchmark, and main; remove KeyError and TypeError from the tuple so those
exceptions propagate normally and are not downgraded to generic CLI failures,
leaving any localized JSON/parsing handlers to catch malformed data where
appropriate (verify JSON parsing code still handles KeyError/TypeError locally).
scripts/hardware_utils.py (1)

162-171: Fall back to /proc/cpuinfo when lscpu is present but unusable.

Right now any execution/parsing failure in _get_linux_cpu_cores_from_lscpu() returns "Unknown" and stops the lookup, even though /proc/cpuinfo may still have usable data. That regresses reliability on stripped containers and odd distros where lscpu exists but its output is incomplete.

Suggested fix
     def _get_linux_cpu_cores(self) -> str:
         """
         Get CPU core count on Linux.
 
         Returns:
             CPU core count or "Unknown"
         """
-        if not shutil.which("lscpu"):
-            return self._get_linux_cpu_cores_from_proc()
-        return self._get_linux_cpu_cores_from_lscpu()
+        if shutil.which("lscpu"):
+            cores = self._get_linux_cpu_cores_from_lscpu()
+            if cores != "Unknown":
+                return cores
+        return self._get_linux_cpu_cores_from_proc()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/hardware_utils.py` around lines 162 - 171, The current
_get_linux_cpu_cores returns "Unknown" whenever _get_linux_cpu_cores_from_lscpu
fails, preventing a fallback to /proc; modify _get_linux_cpu_cores so it first
checks shutil.which("lscpu") and if present calls
_get_linux_cpu_cores_from_lscpu but catches any exception or an "Unknown" result
and then calls _get_linux_cpu_cores_from_proc as a fallback; ensure exceptions
from _get_linux_cpu_cores_from_lscpu are handled (no crash) and the final
returned string comes from _get_linux_cpu_cores_from_proc when lscpu output is
unusable.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@scripts/compare_storage_backends.py`:
- Around line 47-55: The top-level recoverable set
_RECOVERABLE_COMPARISON_ERRORS currently includes KeyError and TypeError which
causes programming errors to be swallowed inside run_comparison, _run_benchmark,
and main; remove KeyError and TypeError from the tuple so those exceptions
propagate normally and are not downgraded to generic CLI failures, leaving any
localized JSON/parsing handlers to catch malformed data where appropriate
(verify JSON parsing code still handles KeyError/TypeError locally).

In `@scripts/hardware_utils.py`:
- Around line 162-171: The current _get_linux_cpu_cores returns "Unknown"
whenever _get_linux_cpu_cores_from_lscpu fails, preventing a fallback to /proc;
modify _get_linux_cpu_cores so it first checks shutil.which("lscpu") and if
present calls _get_linux_cpu_cores_from_lscpu but catches any exception or an
"Unknown" result and then calls _get_linux_cpu_cores_from_proc as a fallback;
ensure exceptions from _get_linux_cpu_cores_from_lscpu are handled (no crash)
and the final returned string comes from _get_linux_cpu_cores_from_proc when
lscpu output is unusable.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 0ff11e9e-f1eb-4fdd-88b3-718f349fc796

📥 Commits

Reviewing files that changed from the base of the PR and between c301957 and 70703fb.

📒 Files selected for processing (14)
  • benches/profiling_suite.rs
  • docs/dev/python.md
  • pyproject.toml
  • scripts/benchmark_utils.py
  • scripts/compare_storage_backends.py
  • scripts/hardware_utils.py
  • scripts/postprocess_changelog.py
  • scripts/tests/conftest.py
  • scripts/tests/test_benchmark_models.py
  • scripts/tests/test_benchmark_utils.py
  • scripts/tests/test_compare_storage_backends.py
  • scripts/tests/test_hardware_utils.py
  • scripts/tests/test_subprocess_utils.py
  • scripts/tests/test_tag_release.py
✅ Files skipped from review due to trivial changes (4)
  • scripts/tests/test_tag_release.py
  • docs/dev/python.md
  • scripts/tests/test_compare_storage_backends.py
  • scripts/tests/test_subprocess_utils.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • scripts/tests/test_benchmark_models.py
  • benches/profiling_suite.rs
  • scripts/benchmark_utils.py

@acgetchell acgetchell merged commit 0acbf65 into main Apr 28, 2026
21 checks passed
@acgetchell acgetchell deleted the perf/bench-profile-toolchains branch April 28, 2026 17:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api documentation Improvements or additions to documentation enhancement New feature or request geometry Geometry-related issues rust Pull requests that update rust code topology

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant