Add ValueProfile and specialize() helper for argument value specialization#231
Open
PhilippGrulich wants to merge 1 commit intomainfrom
Open
Add ValueProfile and specialize() helper for argument value specialization#231PhilippGrulich wants to merge 1 commit intomainfrom
PhilippGrulich wants to merge 1 commit intomainfrom
Conversation
cd019a1 to
29801bc
Compare
Contributor
There was a problem hiding this comment.
Tracing Benchmark
Details
| Benchmark suite | Current: 20bc8ad | Previous: 4ddc690 | Ratio |
|---|---|---|---|
ssa_add |
190.88 ns (± 17.6943) |
196.852 ns (± 18.2072) |
0.97 |
ssa_ifThenElse |
476.925 ns (± 44.4928) |
486.861 ns (± 70.2234) |
0.98 |
ssa_deeplyNestedIfElse |
1.16644 us (± 87.9243) |
1.1809 us (± 106.412) |
0.99 |
ssa_loop |
494.202 ns (± 41.4552) |
514.039 ns (± 54.5865) |
0.96 |
ssa_ifInsideLoop |
902.565 ns (± 58.5923) |
943.877 ns (± 70.5854) |
0.96 |
ssa_loopDirectCall |
500.529 ns (± 44.2476) |
506.317 ns (± 59.915) |
0.99 |
ssa_pointerLoop |
604.171 ns (± 44.4348) |
614.184 ns (± 51.7513) |
0.98 |
ssa_staticLoop |
501.13 ns (± 24.2694) |
541.031 ns (± 83.2034) |
0.93 |
ssa_fibonacci |
511.478 ns (± 37.3635) |
527.105 ns (± 56.114) |
0.97 |
ssa_gcd |
486.642 ns (± 60.6796) |
487.012 ns (± 85.6455) |
1.00 |
comp_mlir_add |
8.19096 ms (± 369.505) |
8.38057 ms (± 192.605) |
0.98 |
comp_mlir_ifThenElse |
8.68063 ms (± 101.675) |
9.14395 ms (± 187.901) |
0.95 |
comp_mlir_deeplyNestedIfElse |
7.58213 ms (± 86.16) |
7.7788 ms (± 165.806) |
0.97 |
comp_mlir_loop |
9.63447 ms (± 169.805) |
9.9357 ms (± 189.58) |
0.97 |
comp_mlir_ifInsideLoop |
30.7378 ms (± 242.396) |
31.4149 ms (± 339.44) |
0.98 |
comp_mlir_loopDirectCall |
14.15 ms (± 137.355) |
14.7687 ms (± 880.984) |
0.96 |
comp_mlir_pointerLoop |
29.8684 ms (± 236.03) |
30.4114 ms (± 282.375) |
0.98 |
comp_mlir_staticLoop |
7.52574 ms (± 91.2368) |
7.69672 ms (± 127.901) |
0.98 |
comp_mlir_fibonacci |
12.8877 ms (± 287.091) |
13.2546 ms (± 177.381) |
0.97 |
comp_mlir_gcd |
11.8734 ms (± 159.612) |
12.4331 ms (± 330.253) |
0.95 |
comp_mlir_nestedIf10 |
12.9695 ms (± 152.78) |
13.5025 ms (± 306.971) |
0.96 |
comp_mlir_nestedIf100 |
27.6422 ms (± 325.914) |
27.4857 ms (± 419.81) |
1.01 |
comp_mlir_chainedIf10 |
11.9553 ms (± 210.644) |
12.7826 ms (± 270.187) |
0.94 |
comp_mlir_chainedIf100 |
22.6107 ms (± 177.664) |
24.0902 ms (± 540.118) |
0.94 |
comp_cpp_add |
24.5071 ms (± 355.538) |
25.6204 ms (± 845.684) |
0.96 |
comp_cpp_ifThenElse |
25.0888 ms (± 297.095) |
25.9943 ms (± 587.031) |
0.97 |
comp_cpp_deeplyNestedIfElse |
26.1689 ms (± 382.855) |
27.8513 ms (± 1.22198) |
0.94 |
comp_cpp_loop |
25.2702 ms (± 308.487) |
26.5132 ms (± 801.141) |
0.95 |
comp_cpp_ifInsideLoop |
26.0768 ms (± 263.189) |
27.3091 ms (± 703.368) |
0.95 |
comp_cpp_loopDirectCall |
25.469 ms (± 319.87) |
27.1496 ms (± 557.475) |
0.94 |
comp_cpp_pointerLoop |
25.6961 ms (± 268.471) |
27.0546 ms (± 527.374) |
0.95 |
comp_cpp_staticLoop |
24.8914 ms (± 254.397) |
25.7821 ms (± 833.768) |
0.97 |
comp_cpp_fibonacci |
25.325 ms (± 248.615) |
26.1467 ms (± 746.022) |
0.97 |
comp_cpp_gcd |
25.1555 ms (± 277.306) |
26.283 ms (± 1.10201) |
0.96 |
comp_cpp_nestedIf10 |
27.9958 ms (± 277.645) |
28.2778 ms (± 475.46) |
0.99 |
comp_cpp_nestedIf100 |
61.5579 ms (± 370.596) |
62.5057 ms (± 680.203) |
0.98 |
comp_cpp_chainedIf10 |
30.5678 ms (± 285.832) |
30.8186 ms (± 570.568) |
0.99 |
comp_cpp_chainedIf100 |
91.1343 ms (± 844.182) |
94.8954 ms (± 3.63672) |
0.96 |
comp_bc_add |
14.2364 us (± 1.75678) |
14.5932 us (± 2.09862) |
0.98 |
comp_bc_ifThenElse |
17.8769 us (± 2.40526) |
17.5758 us (± 2.73918) |
1.02 |
comp_bc_deeplyNestedIfElse |
22.1569 us (± 2.56091) |
24.34 us (± 5.64301) |
0.91 |
comp_bc_loop |
17.8982 us (± 2.32555) |
18.7744 us (± 4.12069) |
0.95 |
comp_bc_ifInsideLoop |
20.4437 us (± 2.44517) |
20.7524 us (± 2.69374) |
0.99 |
comp_bc_loopDirectCall |
18.9932 us (± 2.85576) |
19.1342 us (± 2.85075) |
0.99 |
comp_bc_pointerLoop |
19.7144 us (± 2.77798) |
20.412 us (± 3.05277) |
0.97 |
comp_bc_staticLoop |
16.8452 us (± 2.86847) |
17.3441 us (± 3.25445) |
0.97 |
comp_bc_fibonacci |
18.3064 us (± 2.43635) |
19.1498 us (± 2.86824) |
0.96 |
comp_bc_gcd |
18.0169 us (± 2.93763) |
18.3923 us (± 3.5189) |
0.98 |
comp_bc_nestedIf10 |
35.0084 us (± 3.47404) |
35.96 us (± 4.1926) |
0.97 |
comp_bc_nestedIf100 |
176.776 us (± 7.80642) |
201.72 us (± 33.1657) |
0.88 |
comp_bc_chainedIf10 |
50.2744 us (± 8.30082) |
55.8101 us (± 13.0197) |
0.90 |
comp_bc_chainedIf100 |
277.791 us (± 10.7667) |
302.553 us (± 23.62) |
0.92 |
comp_asmjit_add |
21.2692 us (± 4.01024) |
22.3937 us (± 4.48756) |
0.95 |
comp_asmjit_ifThenElse |
34.2081 us (± 4.59567) |
34.8408 us (± 4.85456) |
0.98 |
comp_asmjit_deeplyNestedIfElse |
58.8082 us (± 8.78841) |
60.2774 us (± 9.17425) |
0.98 |
comp_asmjit_loop |
35.8823 us (± 3.84668) |
42.025 us (± 7.05867) |
0.85 |
comp_asmjit_ifInsideLoop |
57.5709 us (± 6.86521) |
60.0379 us (± 10.8082) |
0.96 |
comp_asmjit_loopDirectCall |
46.399 us (± 6.97435) |
47.8621 us (± 9.77711) |
0.97 |
comp_asmjit_pointerLoop |
48.3917 us (± 6.45883) |
50.4541 us (± 11.156) |
0.96 |
comp_asmjit_staticLoop |
28.1258 us (± 4.00051) |
28.8697 us (± 4.41559) |
0.97 |
comp_asmjit_fibonacci |
44.1299 us (± 6.61175) |
45.3356 us (± 9.25608) |
0.97 |
comp_asmjit_gcd |
36.4854 us (± 6.43279) |
35.7476 us (± 5.30607) |
1.02 |
comp_asmjit_nestedIf10 |
109.963 us (± 11.6552) |
111.032 us (± 14.5965) |
0.99 |
comp_asmjit_nestedIf100 |
1.1385 ms (± 12.9426) |
1.14527 ms (± 20.1439) |
0.99 |
comp_asmjit_chainedIf10 |
162.08 us (± 11.4284) |
166.561 us (± 15.152) |
0.97 |
comp_asmjit_chainedIf100 |
2.28201 ms (± 25.9004) |
2.27493 ms (± 36.9212) |
1.00 |
ir_add |
854.808 ns (± 71.7311) |
899.987 ns (± 116.357) |
0.95 |
ir_ifThenElse |
2.40972 us (± 165.045) |
2.50916 us (± 309.896) |
0.96 |
ir_deeplyNestedIfElse |
6.86063 us (± 931.152) |
6.62996 us (± 689.783) |
1.03 |
ir_loop |
2.97028 us (± 455.568) |
3.15779 us (± 491.542) |
0.94 |
ir_ifInsideLoop |
5.55614 us (± 390.397) |
5.78225 us (± 651.21) |
0.96 |
ir_loopDirectCall |
3.13579 us (± 224.141) |
3.43855 us (± 555.66) |
0.91 |
ir_pointerLoop |
3.6634 us (± 305.335) |
3.81923 us (± 449.197) |
0.96 |
ir_staticLoop |
2.20591 us (± 164.315) |
2.28226 us (± 197.398) |
0.97 |
ir_fibonacci |
3.09834 us (± 252.729) |
3.15796 us (± 298.215) |
0.98 |
ir_gcd |
2.66002 us (± 246.515) |
2.71337 us (± 324.906) |
0.98 |
ir_nestedIf10 |
15.4621 us (± 1.21789) |
16.2159 us (± 1.70068) |
0.95 |
ir_nestedIf100 |
187.069 us (± 6.41364) |
195.124 us (± 18.2958) |
0.96 |
ir_chainedIf10 |
28.3291 us (± 1.73383) |
29.4978 us (± 3.21451) |
0.96 |
ir_chainedIf100 |
358.614 us (± 9.74037) |
372.238 us (± 13.6387) |
0.96 |
tiered_compile_addOne |
41.4017 us (± 7.66667) |
36.1323 us (± 10.3872) |
1.15 |
single_compile_mlir_addOne |
6.07201 ms (± 89.374) |
6.69496 ms (± 416.394) |
0.91 |
single_compile_cpp_addOne |
24.3658 ms (± 268.6) |
26.6951 ms (± 481.83) |
0.91 |
single_compile_bc_addOne |
41.6916 us (± 5.95621) |
42.6381 us (± 11.2121) |
0.98 |
tiered_compile_sumLoop |
59.9794 us (± 8.56729) |
64.5505 us (± 12.6459) |
0.93 |
single_compile_mlir_sumLoop |
8.09942 ms (± 115.888) |
9.04856 ms (± 286.628) |
0.90 |
single_compile_cpp_sumLoop |
24.8437 ms (± 263.17) |
27.4246 ms (± 571.506) |
0.91 |
single_compile_bc_sumLoop |
60.3862 us (± 6.61555) |
61.7793 us (± 14.2547) |
0.98 |
trace_add |
2.55804 us (± 199.179) |
2.56751 us (± 312.946) |
1.00 |
completing_trace_add |
2.67806 us (± 590.93) |
2.62229 us (± 366.074) |
1.02 |
trace_ifThenElse |
11.673 us (± 1.37586) |
12.1875 us (± 2.1946) |
0.96 |
completing_trace_ifThenElse |
5.35939 us (± 507.912) |
5.66397 us (± 836.071) |
0.95 |
trace_deeplyNestedIfElse |
35.4159 us (± 4.26678) |
35.7092 us (± 7.02971) |
0.99 |
completing_trace_deeplyNestedIfElse |
15.5455 us (± 1.61045) |
15.7523 us (± 3.59692) |
0.99 |
trace_loop |
11.5549 us (± 1.57806) |
11.3827 us (± 1.74982) |
1.02 |
completing_trace_loop |
5.30657 us (± 472.132) |
5.34366 us (± 671.493) |
0.99 |
trace_ifInsideLoop |
22.2748 us (± 2.05627) |
23.2593 us (± 3.89688) |
0.96 |
completing_trace_ifInsideLoop |
10.1716 us (± 1.08231) |
10.3211 us (± 1.4055) |
0.99 |
trace_loopDirectCall |
11.3351 us (± 1.0882) |
11.5762 us (± 1.87128) |
0.98 |
completing_trace_loopDirectCall |
5.37777 us (± 658.769) |
5.55179 us (± 776.341) |
0.97 |
trace_pointerLoop |
17.439 us (± 2.75791) |
17.7094 us (± 3.40857) |
0.98 |
completing_trace_pointerLoop |
11.3092 us (± 1.06916) |
11.8254 us (± 1.97681) |
0.96 |
trace_staticLoop |
9.16521 us (± 769.186) |
9.39437 us (± 1.13559) |
0.98 |
completing_trace_staticLoop |
8.67855 us (± 784.69) |
9.27847 us (± 1.29479) |
0.94 |
trace_fibonacci |
12.7072 us (± 1.14803) |
13.8422 us (± 3.27903) |
0.92 |
completing_trace_fibonacci |
6.68079 us (± 888.898) |
7.03076 us (± 996.33) |
0.95 |
trace_gcd |
10.6098 us (± 1.22677) |
10.8536 us (± 2.31146) |
0.98 |
completing_trace_gcd |
4.47704 us (± 327.454) |
4.59746 us (± 532.736) |
0.97 |
trace_nestedIf10 |
55.4052 us (± 5.78169) |
56.6039 us (± 9.13368) |
0.98 |
completing_trace_nestedIf10 |
54.5952 us (± 5.8429) |
57.1039 us (± 8.66097) |
0.96 |
trace_nestedIf100 |
1.7427 ms (± 48.1858) |
1.75731 ms (± 48.612) |
0.99 |
completing_trace_nestedIf100 |
1.79803 ms (± 38.0432) |
1.80528 ms (± 61.1031) |
1.00 |
trace_chainedIf10 |
136.125 us (± 6.8899) |
142.906 us (± 20.5703) |
0.95 |
completing_trace_chainedIf10 |
70.1246 us (± 10.3851) |
71.253 us (± 10.5092) |
0.98 |
trace_chainedIf100 |
5.10896 ms (± 42.3906) |
5.11334 ms (± 59.9812) |
1.00 |
completing_trace_chainedIf100 |
2.74956 ms (± 29.7741) |
2.77679 ms (± 53.4349) |
0.99 |
e2e_tiered_bc_to_mlir |
43.1178 us (± 9.90148) |
42.742 us (± 12.6185) |
1.01 |
e2e_single_mlir |
7.92751 ms (± 76.5823) |
8.36387 ms (± 191.751) |
0.95 |
exec_mlir_add |
9.86577 ns (± 0.881544) |
10.2965 ns (± 1.03277) |
0.96 |
exec_mlir_fibonacci |
13.1156 us (± 888.427) |
15.9968 us (± 2.35392) |
0.82 |
exec_mlir_sum |
530.327 us (± 19.1346) |
594.472 us (± 29.2638) |
0.89 |
exec_cpp_add |
4.70183 ns (± 0.794904) |
4.73691 ns (± 0.799371) |
0.99 |
exec_cpp_fibonacci |
96.4755 us (± 9.99885) |
99.3508 us (± 15.1486) |
0.97 |
exec_cpp_sum |
35.9154 ms (± 112.524) |
35.9916 ms (± 159.868) |
1.00 |
exec_bc_add |
42.9882 ns (± 4.07597) |
44.2275 ns (± 6.98237) |
0.97 |
exec_bc_fibonacci |
899.318 us (± 7.32116) |
902.474 us (± 10.0672) |
1.00 |
exec_bc_sum |
190.626 ms (± 291.854) |
190.799 ms (± 397.461) |
1.00 |
exec_asmjit_add |
3.21068 ns (± 0.278466) |
3.24248 ns (± 0.494265) |
0.99 |
exec_asmjit_fibonacci |
22.6468 us (± 5.57331) |
21.3141 us (± 1.36548) |
1.06 |
exec_asmjit_sum |
4.61201 ms (± 28.2952) |
4.62116 ms (± 62.4678) |
1.00 |
exec_bc_addOne |
36.1161 ns (± 4.8241) |
37.2873 ns (± 7.38083) |
0.97 |
exec_mlir_addOne |
274.094 ns (± 3.33802) |
299.046 ns (± 11.2662) |
0.92 |
exec_cpp_addOne |
3.9852 ns (± 0.689287) |
4.03145 ns (± 0.734027) |
0.99 |
exec_interpreted_addOne |
38.0425 ns (± 1.89879) |
38.1182 ns (± 2.30459) |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
4c65973 to
30c1e3b
Compare
Adds the nautilus specialization plugin providing: - ValueProfile<T> for runtime argument value profiling - SpecializedNautilusFunction wrapper that behaves like NautilusFunction but emits a nested dispatcher function which routes calls to a specialized or generic compiled variant based on a stable profile - assume() intrinsic plumbing (existing) Includes backend-parameterized behavioural tests across all enabled code-gen backends (mlir/cpp/bc/asmjit) plus the interpreter, MLIR IR-shape inspection tests, and CI integration for both the regular and LLVM IR test executables. Also fixes an MLIRLoweringProvider bug where blockMapping was not cleared between successive generateFunction calls, which caused 'reference to block defined in another region' errors when multiple functions with identically-named blocks coexisted in one module.
30c1e3b to
20bc8ad
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Introduces a header-only API in the specialization plugin that lets a
traced kernel mark arguments for value specialization. specialize(arg, p)
emits a profile-update proxy call when the profile is empty, and a
traced 'if (arg == c) { nautilus_assume(arg == c); arg = c; }'
dispatcher once the profile has stabilized so that downstream uses of
the argument can be const-folded by ConstantPropagationPhase + LLVM.