Add assume_constant, assume_range, assume_nonzero specializations#229
Open
PhilippGrulich wants to merge 1 commit intomainfrom
Open
Add assume_constant, assume_range, assume_nonzero specializations#229PhilippGrulich wants to merge 1 commit intomainfrom
PhilippGrulich wants to merge 1 commit intomainfrom
Conversation
Extend the specialization plugin with three new compile-time hints for int64_t values, lowered to arith.cmpi + llvm.intr.assume in the MLIR backend: - nautilus_assume_constant(v, k): folds v to k along dominated uses, enabling polyvariant-style specialization without per-value recompile - nautilus_assume_range(v, lo, hi): inclusive range assertion, enables bounds-check elimination and narrower codegen - nautilus_assume_nonzero(v): eliminates UB checks on div/ctz/clz Adds behavioral tests mirroring cfWithAssume that verify the else branch becomes unreachable when intrinsics are enabled.
Contributor
There was a problem hiding this comment.
Tracing Benchmark
Details
| Benchmark suite | Current: 789b0d3 | Previous: 95e8887 | Ratio |
|---|---|---|---|
ir_add |
804.163 ns (± 49.1361) |
828.84 ns (± 82.8408) |
0.97 |
ir_ifThenElse |
2.32709 us (± 191.762) |
2.327 us (± 173.678) |
1.00 |
ir_deeplyNestedIfElse |
6.16274 us (± 703.903) |
6.04255 us (± 892.815) |
1.02 |
ir_loop |
2.73662 us (± 291.126) |
2.73721 us (± 341.761) |
1.00 |
ir_ifInsideLoop |
5.26198 us (± 453.858) |
5.38331 us (± 719.711) |
0.98 |
ir_loopDirectCall |
2.88451 us (± 152.747) |
3.06827 us (± 423.488) |
0.94 |
ir_pointerLoop |
3.49298 us (± 234.216) |
3.53539 us (± 345.028) |
0.99 |
ir_staticLoop |
2.1419 us (± 206.921) |
2.21064 us (± 284.943) |
0.97 |
ir_fibonacci |
2.96983 us (± 331.145) |
2.90143 us (± 244.366) |
1.02 |
ir_gcd |
2.44577 us (± 187.211) |
2.44606 us (± 270.622) |
1.00 |
ir_nestedIf10 |
18.1623 us (± 1.10093) |
17.9662 us (± 1.46283) |
1.01 |
ir_nestedIf100 |
192.557 us (± 8.14834) |
192.903 us (± 8.96549) |
1.00 |
ir_chainedIf10 |
28.1593 us (± 3.46762) |
27.1134 us (± 2.48612) |
1.04 |
ir_chainedIf100 |
334.122 us (± 14.4497) |
328.153 us (± 11.1599) |
1.02 |
comp_mlir_add |
8.25224 ms (± 168.738) |
8.56063 ms (± 543.463) |
0.96 |
comp_mlir_ifThenElse |
8.88063 ms (± 206.291) |
8.74628 ms (± 182.559) |
1.02 |
comp_mlir_deeplyNestedIfElse |
7.82322 ms (± 171.299) |
7.65178 ms (± 154.069) |
1.02 |
comp_mlir_loop |
9.8849 ms (± 176.16) |
9.97263 ms (± 369.535) |
0.99 |
comp_mlir_ifInsideLoop |
31.6519 ms (± 325.086) |
31.6847 ms (± 628.998) |
1.00 |
comp_mlir_loopDirectCall |
14.7022 ms (± 210.273) |
14.5333 ms (± 335.437) |
1.01 |
comp_mlir_pointerLoop |
30.8649 ms (± 326.553) |
30.5773 ms (± 705.955) |
1.01 |
comp_mlir_staticLoop |
7.7511 ms (± 144.577) |
7.77456 ms (± 253.18) |
1.00 |
comp_mlir_fibonacci |
13.3946 ms (± 214.648) |
13.4356 ms (± 1.06491) |
1.00 |
comp_mlir_gcd |
12.3119 ms (± 211.599) |
11.9623 ms (± 164.38) |
1.03 |
comp_mlir_nestedIf10 |
13.2676 ms (± 180.702) |
13.4457 ms (± 380.764) |
0.99 |
comp_mlir_nestedIf100 |
27.4898 ms (± 357.858) |
27.5814 ms (± 409.777) |
1.00 |
comp_mlir_chainedIf10 |
12.3011 ms (± 219.649) |
12.3656 ms (± 373.096) |
0.99 |
comp_mlir_chainedIf100 |
23.0311 ms (± 385.274) |
23.6196 ms (± 469.362) |
0.98 |
comp_cpp_add |
25.0149 ms (± 518.37) |
26.389 ms (± 585.503) |
0.95 |
comp_cpp_ifThenElse |
25.7091 ms (± 385.531) |
26.7309 ms (± 565.674) |
0.96 |
comp_cpp_deeplyNestedIfElse |
26.5699 ms (± 445.031) |
27.4164 ms (± 444.738) |
0.97 |
comp_cpp_loop |
26.2727 ms (± 818.7) |
26.5711 ms (± 558.246) |
0.99 |
comp_cpp_ifInsideLoop |
26.8971 ms (± 450.996) |
27.4761 ms (± 568.475) |
0.98 |
comp_cpp_loopDirectCall |
26.2556 ms (± 362.265) |
26.6092 ms (± 355.038) |
0.99 |
comp_cpp_pointerLoop |
26.4363 ms (± 427.592) |
26.898 ms (± 564.159) |
0.98 |
comp_cpp_staticLoop |
25.5894 ms (± 429.693) |
26.1455 ms (± 561.703) |
0.98 |
comp_cpp_fibonacci |
25.919 ms (± 417.809) |
27.1894 ms (± 714.197) |
0.95 |
comp_cpp_gcd |
26.313 ms (± 613.073) |
26.435 ms (± 749.21) |
1.00 |
comp_cpp_nestedIf10 |
29.4225 ms (± 847.276) |
29.7773 ms (± 500.117) |
0.99 |
comp_cpp_nestedIf100 |
62.0275 ms (± 442.569) |
63.0098 ms (± 633.939) |
0.98 |
comp_cpp_chainedIf10 |
31.222 ms (± 528.816) |
31.466 ms (± 483.136) |
0.99 |
comp_cpp_chainedIf100 |
92.0857 ms (± 585.932) |
92.387 ms (± 1.38384) |
1.00 |
comp_bc_add |
14.4159 us (± 2.40107) |
14.5896 us (± 2.61709) |
0.99 |
comp_bc_ifThenElse |
17.7704 us (± 3.76308) |
18.1622 us (± 3.80834) |
0.98 |
comp_bc_deeplyNestedIfElse |
22.1237 us (± 3.80645) |
23.1227 us (± 4.36256) |
0.96 |
comp_bc_loop |
17.6422 us (± 3.19135) |
18.4381 us (± 3.85839) |
0.96 |
comp_bc_ifInsideLoop |
20.5866 us (± 3.19829) |
20.8386 us (± 3.31124) |
0.99 |
comp_bc_loopDirectCall |
18.2021 us (± 2.53557) |
19.0956 us (± 3.80828) |
0.95 |
comp_bc_pointerLoop |
19.8676 us (± 4.1658) |
20.3831 us (± 4.21148) |
0.97 |
comp_bc_staticLoop |
16.9233 us (± 3.27156) |
16.8146 us (± 2.77516) |
1.01 |
comp_bc_fibonacci |
17.9033 us (± 2.68612) |
18.598 us (± 3.40187) |
0.96 |
comp_bc_gcd |
17.7802 us (± 2.75739) |
18.4043 us (± 3.6494) |
0.97 |
comp_bc_nestedIf10 |
35.8066 us (± 4.64086) |
35.5784 us (± 3.90233) |
1.01 |
comp_bc_nestedIf100 |
187.468 us (± 10.4986) |
186.483 us (± 12.5132) |
1.01 |
comp_bc_chainedIf10 |
51.2015 us (± 6.73228) |
51.7139 us (± 8.20885) |
0.99 |
comp_bc_chainedIf100 |
293.741 us (± 16.55) |
299.525 us (± 16.2828) |
0.98 |
comp_asmjit_add |
21.3278 us (± 4.30038) |
21.73 us (± 5.16618) |
0.98 |
comp_asmjit_ifThenElse |
34.0883 us (± 4.89893) |
35.2904 us (± 6.86847) |
0.97 |
comp_asmjit_deeplyNestedIfElse |
59.4085 us (± 11.7464) |
62.0638 us (± 9.89244) |
0.96 |
comp_asmjit_loop |
34.9626 us (± 3.78091) |
37.1357 us (± 5.52314) |
0.94 |
comp_asmjit_ifInsideLoop |
58.4597 us (± 9.32354) |
59.7663 us (± 10.4637) |
0.98 |
comp_asmjit_loopDirectCall |
46.7219 us (± 9.6229) |
47.6748 us (± 9.74794) |
0.98 |
comp_asmjit_pointerLoop |
49.3611 us (± 8.82469) |
49.9905 us (± 8.30711) |
0.99 |
comp_asmjit_staticLoop |
27.8424 us (± 4.65157) |
28.2548 us (± 4.55249) |
0.99 |
comp_asmjit_fibonacci |
44.0888 us (± 7.29862) |
44.8958 us (± 8.54968) |
0.98 |
comp_asmjit_gcd |
35.9702 us (± 6.35088) |
35.7774 us (± 5.19491) |
1.01 |
comp_asmjit_nestedIf10 |
111.743 us (± 14.2653) |
111.247 us (± 13.9778) |
1.00 |
comp_asmjit_nestedIf100 |
1.18227 ms (± 18.1166) |
1.13736 ms (± 22.5195) |
1.04 |
comp_asmjit_chainedIf10 |
165.945 us (± 14.5918) |
166.553 us (± 15.5738) |
1.00 |
comp_asmjit_chainedIf100 |
2.30313 ms (± 23.6666) |
2.29339 ms (± 33.5179) |
1.00 |
ssa_add |
192.255 ns (± 12.0211) |
185.546 ns (± 6.1432) |
1.04 |
ssa_ifThenElse |
474.55 ns (± 30.4263) |
472.625 ns (± 41.2207) |
1.00 |
ssa_deeplyNestedIfElse |
1.2063 us (± 90.5838) |
1.17951 us (± 116.072) |
1.02 |
ssa_loop |
500.013 ns (± 67.9712) |
483.839 ns (± 28.9373) |
1.03 |
ssa_ifInsideLoop |
927.35 ns (± 72.9021) |
917.686 ns (± 89.7529) |
1.01 |
ssa_loopDirectCall |
495.685 ns (± 36.2361) |
505.34 ns (± 39.7427) |
0.98 |
ssa_pointerLoop |
615.986 ns (± 47.6008) |
613.64 ns (± 81.2383) |
1.00 |
ssa_staticLoop |
508.091 ns (± 33.7457) |
497.755 ns (± 32.5169) |
1.02 |
ssa_fibonacci |
514.412 ns (± 40.985) |
515.834 ns (± 39.9811) |
1.00 |
ssa_gcd |
454.523 ns (± 29.6668) |
457.468 ns (± 32.8931) |
0.99 |
exec_mlir_add |
10.2293 ns (± 1.71769) |
9.82703 ns (± 1.19291) |
1.04 |
exec_mlir_fibonacci |
14.1004 us (± 1.47076) |
15.2455 us (± 1.80667) |
0.92 |
exec_mlir_sum |
546.468 us (± 18.752) |
593.588 us (± 38.1341) |
0.92 |
exec_cpp_add |
4.68203 ns (± 0.830682) |
4.78618 ns (± 0.866728) |
0.98 |
exec_cpp_fibonacci |
95.5695 us (± 6.48744) |
97.4229 us (± 9.96581) |
0.98 |
exec_cpp_sum |
35.925 ms (± 83.6513) |
36.0618 ms (± 726.102) |
1.00 |
exec_bc_add |
43.5881 ns (± 5.54132) |
42.697 ns (± 4.97672) |
1.02 |
exec_bc_fibonacci |
898.068 us (± 5.27613) |
900.218 us (± 11.7035) |
1.00 |
exec_bc_sum |
190.637 ms (± 226.698) |
190.557 ms (± 580.979) |
1.00 |
exec_asmjit_add |
3.19725 ns (± 0.381171) |
3.21561 ns (± 0.421704) |
0.99 |
exec_asmjit_fibonacci |
21.3435 us (± 2.24842) |
21.4569 us (± 2.10364) |
0.99 |
exec_asmjit_sum |
4.5955 ms (± 16.5781) |
4.60086 ms (± 20.9016) |
1.00 |
trace_add |
2.50348 us (± 243.496) |
2.47069 us (± 250.901) |
1.01 |
completing_trace_add |
2.43994 us (± 186.519) |
2.50548 us (± 250.353) |
0.97 |
trace_ifThenElse |
11.556 us (± 1.6907) |
11.4901 us (± 1.56322) |
1.01 |
completing_trace_ifThenElse |
5.21821 us (± 582.476) |
5.32981 us (± 605.531) |
0.98 |
trace_deeplyNestedIfElse |
35.1848 us (± 6.35506) |
35.4085 us (± 6.26712) |
0.99 |
completing_trace_deeplyNestedIfElse |
15.8292 us (± 3.05378) |
15.6684 us (± 2.3932) |
1.01 |
trace_loop |
11.332 us (± 1.74003) |
11.3268 us (± 1.54147) |
1.00 |
completing_trace_loop |
5.2287 us (± 734.659) |
5.31795 us (± 601.613) |
0.98 |
trace_ifInsideLoop |
22.4365 us (± 3.22473) |
22.2736 us (± 3.03981) |
1.01 |
completing_trace_ifInsideLoop |
10.0847 us (± 1.23844) |
10.2543 us (± 1.59787) |
0.98 |
trace_loopDirectCall |
11.4189 us (± 1.97748) |
11.521 us (± 1.81108) |
0.99 |
completing_trace_loopDirectCall |
5.31181 us (± 643.825) |
5.42546 us (± 657.84) |
0.98 |
trace_pointerLoop |
17.4342 us (± 3.09891) |
17.8171 us (± 3.36746) |
0.98 |
completing_trace_pointerLoop |
11.5378 us (± 1.63406) |
11.3919 us (± 1.60025) |
1.01 |
trace_staticLoop |
9.50942 us (± 1.15649) |
9.23362 us (± 1.08211) |
1.03 |
completing_trace_staticLoop |
8.99809 us (± 1.0492) |
8.95766 us (± 992.359) |
1.00 |
trace_fibonacci |
12.8087 us (± 1.73209) |
12.8777 us (± 1.81245) |
0.99 |
completing_trace_fibonacci |
6.60468 us (± 731.337) |
6.78543 us (± 860.229) |
0.97 |
trace_gcd |
10.4326 us (± 1.52799) |
10.6716 us (± 1.6349) |
0.98 |
completing_trace_gcd |
4.51881 us (± 524.938) |
4.58411 us (± 477.927) |
0.99 |
trace_nestedIf10 |
56.4072 us (± 8.12164) |
56.115 us (± 8.59747) |
1.01 |
completing_trace_nestedIf10 |
55.2738 us (± 7.758) |
55.2226 us (± 7.70709) |
1.00 |
trace_nestedIf100 |
1.75839 ms (± 30.4956) |
1.74536 ms (± 48.8679) |
1.01 |
completing_trace_nestedIf100 |
1.82955 ms (± 40.209) |
1.79904 ms (± 51.1365) |
1.02 |
trace_chainedIf10 |
141.271 us (± 13.8633) |
142.554 us (± 23.4004) |
0.99 |
completing_trace_chainedIf10 |
72.6057 us (± 12.1594) |
69.9689 us (± 8.48319) |
1.04 |
trace_chainedIf100 |
5.11716 ms (± 40.8196) |
5.13355 ms (± 153.958) |
1.00 |
completing_trace_chainedIf100 |
2.77032 ms (± 33.2969) |
2.76711 ms (± 38.7664) |
1.00 |
exec_bc_addOne |
35.6306 ns (± 4.60109) |
34.7667 ns (± 4.73696) |
1.02 |
exec_mlir_addOne |
283.505 ns (± 7.09299) |
279.973 ns (± 10.1435) |
1.01 |
exec_cpp_addOne |
4.02026 ns (± 0.597412) |
4.09266 ns (± 0.843569) |
0.98 |
exec_interpreted_addOne |
37.7071 ns (± 3.08703) |
38.7335 ns (± 3.81894) |
0.97 |
tiered_compile_addOne |
42.0005 us (± 11.4097) |
43.0024 us (± 14.081) |
0.98 |
single_compile_mlir_addOne |
6.22877 ms (± 151.145) |
6.29266 ms (± 226.594) |
0.99 |
single_compile_cpp_addOne |
25.4053 ms (± 432.846) |
26.0653 ms (± 770.35) |
0.97 |
single_compile_bc_addOne |
42.4341 us (± 10.957) |
43.4575 us (± 12.0455) |
0.98 |
tiered_compile_sumLoop |
62.0719 us (± 13.0222) |
61.9168 us (± 13.0366) |
1.00 |
single_compile_mlir_sumLoop |
8.33694 ms (± 194.029) |
8.39337 ms (± 317.623) |
0.99 |
single_compile_cpp_sumLoop |
25.933 ms (± 360.928) |
26.7968 ms (± 664.836) |
0.97 |
single_compile_bc_sumLoop |
61.8387 us (± 12.4404) |
63.0301 us (± 13.0382) |
0.98 |
e2e_tiered_bc_to_mlir |
43.1949 us (± 11.6387) |
44.1003 us (± 11.8406) |
0.98 |
e2e_single_mlir |
8.1897 ms (± 146.342) |
8.22028 ms (± 239.865) |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Extend the specialization plugin with three new compile-time hints for
int64_t values, lowered to arith.cmpi + llvm.intr.assume in the MLIR
backend:
enabling polyvariant-style specialization without per-value recompile
bounds-check elimination and narrower codegen
Adds behavioral tests mirroring cfWithAssume that verify the else
branch becomes unreachable when intrinsics are enabled.