Improve backend benchmarking with hierarchical names, expanded workloads, and split CI suites by PhilippGrulich · Pull Request #238 · nebulastream/nautilus

PhilippGrulich · 2026-04-10T05:40:00Z

Add BenchmarkUtil.hpp with shared getEnabledBackends() to eliminate
duplicated #ifdef blocks across all benchmark files
Adopt hierarchical benchmark names (e.g., tracing/exception/add,
pipeline/compile/mlir/fibonacci, execution/bc/collatz) so the
benchmark-action UI can group metrics by category
Add Catch2 tags ([tracing], [pipeline], [execution], [tiered]) to
enable running benchmark subsets independently
Expand ExecutionBenchmark from 3 to 8 workloads (add, fibonacci,
sumLoop, collatz, nestedSumLoop, ifThenElse, gcd, arraySum) covering
arithmetic, loops, branching, memory, and nested patterns
Add native C++ baselines (execution/native/*) for direct comparison
against JIT-compiled backends
Add interpreted baselines (execution/interpreted/*) for overhead
measurement
Split benchmark.yml into 4 named benchmark-action suites (Tracing
Overhead, Compilation Pipeline, Execution Throughput, Tiered
Compilation) so each gets its own chart on GitHub Pages
Increase chart history from 20 to 50 data points
Add 150% alert threshold for regression detection
Upload benchmark results as CI artifacts for debugging

https://claude.ai/code/session_01He1xQnUZMRThAb4wvQtiN1

…stom dashboard Overhaul the benchmark infrastructure to enable meaningful backend comparison, add plugin/intrinsic coverage, and provide a richer UI. Benchmark infrastructure: - Add BenchmarkUtil.hpp with shared getEnabledBackends() eliminating duplicated #ifdef blocks across all benchmark files - Adopt hierarchical benchmark names (e.g., execution/mlir/fibonacci, pipeline/compile/cpp/gcd, plugins/simd/native/dotProduct) - Add Catch2 tags ([tracing], [pipeline], [execution], [tiered], [plugins]) for running benchmark subsets independently - Pre-compile all JIT functions once before Catch2 benchmark loops (fixes timeout caused by recompilation on every sample) Execution benchmarks (ExecutionBenchmark.cpp): - Expand from 3 to 7 workloads: add, fibonacci, sumLoop, collatz, ifThenElse, gcd, arraySum - Add native C++ baselines (execution/native/*) - Add interpreted baselines (execution/interpreted/*) - Fix collatz int32_t overflow by using int64_t - Skip arraySum on bc backend (too slow for 1M interpreted elements) Pipeline benchmarks (TracingBenchmark.cpp): - Pre-compute trace/SSA/IR once per function, cache IR for reuse across backends (eliminates redundant tracing per sample) - Rename tracing contexts: trace -> exception, completing_trace -> lazy Plugin intrinsic benchmarks (new): - PluginBenchmark.cpp: math (sqrt, sin, cos, exp, log, pow, fma, floor, ceil, composite expr), bit (popcount, countl_zero, countr_zero, byteswap, rotl, composite bit-mix), memory (memcpy/memset at 64B/4KB/1MB) — all across backends with native baselines - SimdBenchmark.cpp: SIMD vector ops (vectorAdd, vectorMul, vectorFma, reduceAdd, dotProduct, distanceSquared, vectorAddInt, reduceAddInt) with native scalar baselines - Separate nautilus-plugin-benchmarks executable linking nautilus-std and nautilus-simd CI workflow (benchmark.yml): - Split into 5 named benchmark-action suites (Tracing Overhead, Compilation Pipeline, Execution Throughput, Tiered Compilation, Plugin Intrinsics) — each gets its own chart on GitHub Pages - First step fetches pages branch; steps 2-5 use skip-fetch-gh-pages to avoid non-fast-forward conflicts on the local ref - Single push at the end with all data + custom dashboard - Increase chart history from 20 to 50 data points - Add 150% alert threshold for regression detection - Upload benchmark results as CI artifacts Custom dashboard (docs/benchmark/dashboard.html): - Self-contained Chart.js page deployed to GitHub Pages - Overview: grouped bar chart + speedup-vs-native table - Per-category tabs: execution, pipeline, tracing, tiered, plugins - Plugin intrinsics tab: math/bit/simd/memory charts + speedup table - Historical trend charts across all suites https://claude.ai/code/session_01He1xQnUZMRThAb4wvQtiN1

PhilippGrulich force-pushed the claude/improve-backend-benchmarking-HjDRt branch 2 times, most recently from 53f3830 to 0238c53 Compare April 12, 2026 13:44

PhilippGrulich force-pushed the claude/improve-backend-benchmarking-HjDRt branch from 0238c53 to 3f90273 Compare April 14, 2026 01:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve backend benchmarking with hierarchical names, expanded workloads, and split CI suites#238

Improve backend benchmarking with hierarchical names, expanded workloads, and split CI suites#238
PhilippGrulich wants to merge 1 commit intomainfrom
claude/improve-backend-benchmarking-HjDRt

PhilippGrulich commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

PhilippGrulich commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants