Skip to content

Add assume_constant, assume_range, assume_nonzero specializations#229

Open
PhilippGrulich wants to merge 1 commit intomainfrom
claude/specialization-plugin-exploration-WtBHV
Open

Add assume_constant, assume_range, assume_nonzero specializations#229
PhilippGrulich wants to merge 1 commit intomainfrom
claude/specialization-plugin-exploration-WtBHV

Conversation

@PhilippGrulich
Copy link
Copy Markdown
Member

Extend the specialization plugin with three new compile-time hints for
int64_t values, lowered to arith.cmpi + llvm.intr.assume in the MLIR
backend:

  • nautilus_assume_constant(v, k): folds v to k along dominated uses,
    enabling polyvariant-style specialization without per-value recompile
  • nautilus_assume_range(v, lo, hi): inclusive range assertion, enables
    bounds-check elimination and narrower codegen
  • nautilus_assume_nonzero(v): eliminates UB checks on div/ctz/clz

Adds behavioral tests mirroring cfWithAssume that verify the else
branch becomes unreachable when intrinsics are enabled.

Extend the specialization plugin with three new compile-time hints for
int64_t values, lowered to arith.cmpi + llvm.intr.assume in the MLIR
backend:

- nautilus_assume_constant(v, k): folds v to k along dominated uses,
  enabling polyvariant-style specialization without per-value recompile
- nautilus_assume_range(v, lo, hi): inclusive range assertion, enables
  bounds-check elimination and narrower codegen
- nautilus_assume_nonzero(v): eliminates UB checks on div/ctz/clz

Adds behavioral tests mirroring cfWithAssume that verify the else
branch becomes unreachable when intrinsics are enabled.
Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracing Benchmark

Details
Benchmark suite Current: 789b0d3 Previous: 95e8887 Ratio
ir_add 804.163 ns (± 49.1361) 828.84 ns (± 82.8408) 0.97
ir_ifThenElse 2.32709 us (± 191.762) 2.327 us (± 173.678) 1.00
ir_deeplyNestedIfElse 6.16274 us (± 703.903) 6.04255 us (± 892.815) 1.02
ir_loop 2.73662 us (± 291.126) 2.73721 us (± 341.761) 1.00
ir_ifInsideLoop 5.26198 us (± 453.858) 5.38331 us (± 719.711) 0.98
ir_loopDirectCall 2.88451 us (± 152.747) 3.06827 us (± 423.488) 0.94
ir_pointerLoop 3.49298 us (± 234.216) 3.53539 us (± 345.028) 0.99
ir_staticLoop 2.1419 us (± 206.921) 2.21064 us (± 284.943) 0.97
ir_fibonacci 2.96983 us (± 331.145) 2.90143 us (± 244.366) 1.02
ir_gcd 2.44577 us (± 187.211) 2.44606 us (± 270.622) 1.00
ir_nestedIf10 18.1623 us (± 1.10093) 17.9662 us (± 1.46283) 1.01
ir_nestedIf100 192.557 us (± 8.14834) 192.903 us (± 8.96549) 1.00
ir_chainedIf10 28.1593 us (± 3.46762) 27.1134 us (± 2.48612) 1.04
ir_chainedIf100 334.122 us (± 14.4497) 328.153 us (± 11.1599) 1.02
comp_mlir_add 8.25224 ms (± 168.738) 8.56063 ms (± 543.463) 0.96
comp_mlir_ifThenElse 8.88063 ms (± 206.291) 8.74628 ms (± 182.559) 1.02
comp_mlir_deeplyNestedIfElse 7.82322 ms (± 171.299) 7.65178 ms (± 154.069) 1.02
comp_mlir_loop 9.8849 ms (± 176.16) 9.97263 ms (± 369.535) 0.99
comp_mlir_ifInsideLoop 31.6519 ms (± 325.086) 31.6847 ms (± 628.998) 1.00
comp_mlir_loopDirectCall 14.7022 ms (± 210.273) 14.5333 ms (± 335.437) 1.01
comp_mlir_pointerLoop 30.8649 ms (± 326.553) 30.5773 ms (± 705.955) 1.01
comp_mlir_staticLoop 7.7511 ms (± 144.577) 7.77456 ms (± 253.18) 1.00
comp_mlir_fibonacci 13.3946 ms (± 214.648) 13.4356 ms (± 1.06491) 1.00
comp_mlir_gcd 12.3119 ms (± 211.599) 11.9623 ms (± 164.38) 1.03
comp_mlir_nestedIf10 13.2676 ms (± 180.702) 13.4457 ms (± 380.764) 0.99
comp_mlir_nestedIf100 27.4898 ms (± 357.858) 27.5814 ms (± 409.777) 1.00
comp_mlir_chainedIf10 12.3011 ms (± 219.649) 12.3656 ms (± 373.096) 0.99
comp_mlir_chainedIf100 23.0311 ms (± 385.274) 23.6196 ms (± 469.362) 0.98
comp_cpp_add 25.0149 ms (± 518.37) 26.389 ms (± 585.503) 0.95
comp_cpp_ifThenElse 25.7091 ms (± 385.531) 26.7309 ms (± 565.674) 0.96
comp_cpp_deeplyNestedIfElse 26.5699 ms (± 445.031) 27.4164 ms (± 444.738) 0.97
comp_cpp_loop 26.2727 ms (± 818.7) 26.5711 ms (± 558.246) 0.99
comp_cpp_ifInsideLoop 26.8971 ms (± 450.996) 27.4761 ms (± 568.475) 0.98
comp_cpp_loopDirectCall 26.2556 ms (± 362.265) 26.6092 ms (± 355.038) 0.99
comp_cpp_pointerLoop 26.4363 ms (± 427.592) 26.898 ms (± 564.159) 0.98
comp_cpp_staticLoop 25.5894 ms (± 429.693) 26.1455 ms (± 561.703) 0.98
comp_cpp_fibonacci 25.919 ms (± 417.809) 27.1894 ms (± 714.197) 0.95
comp_cpp_gcd 26.313 ms (± 613.073) 26.435 ms (± 749.21) 1.00
comp_cpp_nestedIf10 29.4225 ms (± 847.276) 29.7773 ms (± 500.117) 0.99
comp_cpp_nestedIf100 62.0275 ms (± 442.569) 63.0098 ms (± 633.939) 0.98
comp_cpp_chainedIf10 31.222 ms (± 528.816) 31.466 ms (± 483.136) 0.99
comp_cpp_chainedIf100 92.0857 ms (± 585.932) 92.387 ms (± 1.38384) 1.00
comp_bc_add 14.4159 us (± 2.40107) 14.5896 us (± 2.61709) 0.99
comp_bc_ifThenElse 17.7704 us (± 3.76308) 18.1622 us (± 3.80834) 0.98
comp_bc_deeplyNestedIfElse 22.1237 us (± 3.80645) 23.1227 us (± 4.36256) 0.96
comp_bc_loop 17.6422 us (± 3.19135) 18.4381 us (± 3.85839) 0.96
comp_bc_ifInsideLoop 20.5866 us (± 3.19829) 20.8386 us (± 3.31124) 0.99
comp_bc_loopDirectCall 18.2021 us (± 2.53557) 19.0956 us (± 3.80828) 0.95
comp_bc_pointerLoop 19.8676 us (± 4.1658) 20.3831 us (± 4.21148) 0.97
comp_bc_staticLoop 16.9233 us (± 3.27156) 16.8146 us (± 2.77516) 1.01
comp_bc_fibonacci 17.9033 us (± 2.68612) 18.598 us (± 3.40187) 0.96
comp_bc_gcd 17.7802 us (± 2.75739) 18.4043 us (± 3.6494) 0.97
comp_bc_nestedIf10 35.8066 us (± 4.64086) 35.5784 us (± 3.90233) 1.01
comp_bc_nestedIf100 187.468 us (± 10.4986) 186.483 us (± 12.5132) 1.01
comp_bc_chainedIf10 51.2015 us (± 6.73228) 51.7139 us (± 8.20885) 0.99
comp_bc_chainedIf100 293.741 us (± 16.55) 299.525 us (± 16.2828) 0.98
comp_asmjit_add 21.3278 us (± 4.30038) 21.73 us (± 5.16618) 0.98
comp_asmjit_ifThenElse 34.0883 us (± 4.89893) 35.2904 us (± 6.86847) 0.97
comp_asmjit_deeplyNestedIfElse 59.4085 us (± 11.7464) 62.0638 us (± 9.89244) 0.96
comp_asmjit_loop 34.9626 us (± 3.78091) 37.1357 us (± 5.52314) 0.94
comp_asmjit_ifInsideLoop 58.4597 us (± 9.32354) 59.7663 us (± 10.4637) 0.98
comp_asmjit_loopDirectCall 46.7219 us (± 9.6229) 47.6748 us (± 9.74794) 0.98
comp_asmjit_pointerLoop 49.3611 us (± 8.82469) 49.9905 us (± 8.30711) 0.99
comp_asmjit_staticLoop 27.8424 us (± 4.65157) 28.2548 us (± 4.55249) 0.99
comp_asmjit_fibonacci 44.0888 us (± 7.29862) 44.8958 us (± 8.54968) 0.98
comp_asmjit_gcd 35.9702 us (± 6.35088) 35.7774 us (± 5.19491) 1.01
comp_asmjit_nestedIf10 111.743 us (± 14.2653) 111.247 us (± 13.9778) 1.00
comp_asmjit_nestedIf100 1.18227 ms (± 18.1166) 1.13736 ms (± 22.5195) 1.04
comp_asmjit_chainedIf10 165.945 us (± 14.5918) 166.553 us (± 15.5738) 1.00
comp_asmjit_chainedIf100 2.30313 ms (± 23.6666) 2.29339 ms (± 33.5179) 1.00
ssa_add 192.255 ns (± 12.0211) 185.546 ns (± 6.1432) 1.04
ssa_ifThenElse 474.55 ns (± 30.4263) 472.625 ns (± 41.2207) 1.00
ssa_deeplyNestedIfElse 1.2063 us (± 90.5838) 1.17951 us (± 116.072) 1.02
ssa_loop 500.013 ns (± 67.9712) 483.839 ns (± 28.9373) 1.03
ssa_ifInsideLoop 927.35 ns (± 72.9021) 917.686 ns (± 89.7529) 1.01
ssa_loopDirectCall 495.685 ns (± 36.2361) 505.34 ns (± 39.7427) 0.98
ssa_pointerLoop 615.986 ns (± 47.6008) 613.64 ns (± 81.2383) 1.00
ssa_staticLoop 508.091 ns (± 33.7457) 497.755 ns (± 32.5169) 1.02
ssa_fibonacci 514.412 ns (± 40.985) 515.834 ns (± 39.9811) 1.00
ssa_gcd 454.523 ns (± 29.6668) 457.468 ns (± 32.8931) 0.99
exec_mlir_add 10.2293 ns (± 1.71769) 9.82703 ns (± 1.19291) 1.04
exec_mlir_fibonacci 14.1004 us (± 1.47076) 15.2455 us (± 1.80667) 0.92
exec_mlir_sum 546.468 us (± 18.752) 593.588 us (± 38.1341) 0.92
exec_cpp_add 4.68203 ns (± 0.830682) 4.78618 ns (± 0.866728) 0.98
exec_cpp_fibonacci 95.5695 us (± 6.48744) 97.4229 us (± 9.96581) 0.98
exec_cpp_sum 35.925 ms (± 83.6513) 36.0618 ms (± 726.102) 1.00
exec_bc_add 43.5881 ns (± 5.54132) 42.697 ns (± 4.97672) 1.02
exec_bc_fibonacci 898.068 us (± 5.27613) 900.218 us (± 11.7035) 1.00
exec_bc_sum 190.637 ms (± 226.698) 190.557 ms (± 580.979) 1.00
exec_asmjit_add 3.19725 ns (± 0.381171) 3.21561 ns (± 0.421704) 0.99
exec_asmjit_fibonacci 21.3435 us (± 2.24842) 21.4569 us (± 2.10364) 0.99
exec_asmjit_sum 4.5955 ms (± 16.5781) 4.60086 ms (± 20.9016) 1.00
trace_add 2.50348 us (± 243.496) 2.47069 us (± 250.901) 1.01
completing_trace_add 2.43994 us (± 186.519) 2.50548 us (± 250.353) 0.97
trace_ifThenElse 11.556 us (± 1.6907) 11.4901 us (± 1.56322) 1.01
completing_trace_ifThenElse 5.21821 us (± 582.476) 5.32981 us (± 605.531) 0.98
trace_deeplyNestedIfElse 35.1848 us (± 6.35506) 35.4085 us (± 6.26712) 0.99
completing_trace_deeplyNestedIfElse 15.8292 us (± 3.05378) 15.6684 us (± 2.3932) 1.01
trace_loop 11.332 us (± 1.74003) 11.3268 us (± 1.54147) 1.00
completing_trace_loop 5.2287 us (± 734.659) 5.31795 us (± 601.613) 0.98
trace_ifInsideLoop 22.4365 us (± 3.22473) 22.2736 us (± 3.03981) 1.01
completing_trace_ifInsideLoop 10.0847 us (± 1.23844) 10.2543 us (± 1.59787) 0.98
trace_loopDirectCall 11.4189 us (± 1.97748) 11.521 us (± 1.81108) 0.99
completing_trace_loopDirectCall 5.31181 us (± 643.825) 5.42546 us (± 657.84) 0.98
trace_pointerLoop 17.4342 us (± 3.09891) 17.8171 us (± 3.36746) 0.98
completing_trace_pointerLoop 11.5378 us (± 1.63406) 11.3919 us (± 1.60025) 1.01
trace_staticLoop 9.50942 us (± 1.15649) 9.23362 us (± 1.08211) 1.03
completing_trace_staticLoop 8.99809 us (± 1.0492) 8.95766 us (± 992.359) 1.00
trace_fibonacci 12.8087 us (± 1.73209) 12.8777 us (± 1.81245) 0.99
completing_trace_fibonacci 6.60468 us (± 731.337) 6.78543 us (± 860.229) 0.97
trace_gcd 10.4326 us (± 1.52799) 10.6716 us (± 1.6349) 0.98
completing_trace_gcd 4.51881 us (± 524.938) 4.58411 us (± 477.927) 0.99
trace_nestedIf10 56.4072 us (± 8.12164) 56.115 us (± 8.59747) 1.01
completing_trace_nestedIf10 55.2738 us (± 7.758) 55.2226 us (± 7.70709) 1.00
trace_nestedIf100 1.75839 ms (± 30.4956) 1.74536 ms (± 48.8679) 1.01
completing_trace_nestedIf100 1.82955 ms (± 40.209) 1.79904 ms (± 51.1365) 1.02
trace_chainedIf10 141.271 us (± 13.8633) 142.554 us (± 23.4004) 0.99
completing_trace_chainedIf10 72.6057 us (± 12.1594) 69.9689 us (± 8.48319) 1.04
trace_chainedIf100 5.11716 ms (± 40.8196) 5.13355 ms (± 153.958) 1.00
completing_trace_chainedIf100 2.77032 ms (± 33.2969) 2.76711 ms (± 38.7664) 1.00
exec_bc_addOne 35.6306 ns (± 4.60109) 34.7667 ns (± 4.73696) 1.02
exec_mlir_addOne 283.505 ns (± 7.09299) 279.973 ns (± 10.1435) 1.01
exec_cpp_addOne 4.02026 ns (± 0.597412) 4.09266 ns (± 0.843569) 0.98
exec_interpreted_addOne 37.7071 ns (± 3.08703) 38.7335 ns (± 3.81894) 0.97
tiered_compile_addOne 42.0005 us (± 11.4097) 43.0024 us (± 14.081) 0.98
single_compile_mlir_addOne 6.22877 ms (± 151.145) 6.29266 ms (± 226.594) 0.99
single_compile_cpp_addOne 25.4053 ms (± 432.846) 26.0653 ms (± 770.35) 0.97
single_compile_bc_addOne 42.4341 us (± 10.957) 43.4575 us (± 12.0455) 0.98
tiered_compile_sumLoop 62.0719 us (± 13.0222) 61.9168 us (± 13.0366) 1.00
single_compile_mlir_sumLoop 8.33694 ms (± 194.029) 8.39337 ms (± 317.623) 0.99
single_compile_cpp_sumLoop 25.933 ms (± 360.928) 26.7968 ms (± 664.836) 0.97
single_compile_bc_sumLoop 61.8387 us (± 12.4404) 63.0301 us (± 13.0382) 0.98
e2e_tiered_bc_to_mlir 43.1949 us (± 11.6387) 44.1003 us (± 11.8406) 0.98
e2e_single_mlir 8.1897 ms (± 146.342) 8.22028 ms (± 239.865) 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants