Add annotated load operations with LLVM metadata hints for val_ptr#208
Open
PhilippGrulich wants to merge 1 commit intomainfrom
Open
Add annotated load operations with LLVM metadata hints for val_ptr#208PhilippGrulich wants to merge 1 commit intomainfrom
PhilippGrulich wants to merge 1 commit intomainfrom
Conversation
Introduce load_invariant() and load_nonnull() APIs that flow through the tracing/IR pipeline and emit LLVM !invariant.load metadata via MLIR's LoadOp attributes. This enables LLVM optimizations like LICM for loads from known-immutable context structures in query compilation. - Add LOAD_INVARIANT and LOAD_NONNULL trace operations - Extend LoadOperation with LoadHints (flags + range/deref fields) - Wire through TraceToIRConversionPhase with processHintedLoad() - Set mlir::LLVM::LoadOp invariant attribute in MLIR backend - Non-MLIR backends transparently ignore hints Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Tracing Benchmark
Details
| Benchmark suite | Current: d33ffa1 | Previous: f51ebe3 | Ratio |
|---|---|---|---|
exec_mlir_add |
3.62919 ns (± 0.710418) |
3.4399 ns (± 0.256188) |
1.06 |
exec_mlir_fibonacci |
4.98613 us (± 907.598) |
4.85884 us (± 505.209) |
1.03 |
exec_mlir_sum |
523.265 us (± 17.9864) |
757.372 us (± 21.5287) |
0.69 |
exec_cpp_add |
5.41222 ns (± 0.787407) |
5.42716 ns (± 0.643648) |
1.00 |
exec_cpp_fibonacci |
94.8604 us (± 8.94938) |
94.9613 us (± 7.72533) |
1.00 |
exec_cpp_sum |
35.9968 ms (± 160.543) |
36.1061 ms (± 88.2329) |
1.00 |
exec_bc_add |
42.1606 ns (± 0.299163) |
46.6504 ns (± 7.90573) |
0.90 |
exec_bc_fibonacci |
928.594 us (± 8.09813) |
818.424 us (± 13.2887) |
1.13 |
exec_bc_sum |
199.746 ms (± 399.43) |
176.076 ms (± 1.29583) |
1.13 |
exec_asmjit_add |
3.46415 ns (± 0.49568) |
3.59536 ns (± 0.444299) |
0.96 |
exec_asmjit_fibonacci |
21.5924 us (± 2.6543) |
21.522 us (± 2.09311) |
1.00 |
exec_asmjit_sum |
4.45694 ms (± 25.5995) |
4.58218 ms (± 49.4414) |
0.97 |
trace_add |
2.4397 us (± 166.894) |
3.16624 us (± 902.417) |
0.77 |
completing_trace_add |
2.44949 us (± 204.523) |
2.73966 us (± 452.086) |
0.89 |
trace_ifThenElse |
11.5819 us (± 2.21797) |
12.0375 us (± 2.09207) |
0.96 |
completing_trace_ifThenElse |
5.19741 us (± 622.555) |
5.70137 us (± 1.00096) |
0.91 |
trace_deeplyNestedIfElse |
34.5569 us (± 3.96859) |
36.9404 us (± 7.64993) |
0.94 |
completing_trace_deeplyNestedIfElse |
15.1995 us (± 1.95764) |
17.1665 us (± 3.58801) |
0.89 |
trace_loop |
11.2286 us (± 1.61033) |
11.9224 us (± 1.95407) |
0.94 |
completing_trace_loop |
5.30012 us (± 873.158) |
5.64838 us (± 876.29) |
0.94 |
trace_ifInsideLoop |
22.1906 us (± 3.1822) |
23.4585 us (± 3.68738) |
0.95 |
completing_trace_ifInsideLoop |
9.75766 us (± 1.32196) |
10.8813 us (± 2.06765) |
0.90 |
trace_loopDirectCall |
11.3486 us (± 1.83005) |
12.0761 us (± 2.07061) |
0.94 |
completing_trace_loopDirectCall |
5.42825 us (± 930.228) |
5.74402 us (± 967.082) |
0.95 |
trace_pointerLoop |
17.2482 us (± 3.15697) |
18.4038 us (± 3.58652) |
0.94 |
completing_trace_pointerLoop |
11.1593 us (± 1.48861) |
12.2491 us (± 2.2413) |
0.91 |
trace_staticLoop |
9.09606 us (± 1.04045) |
10.0716 us (± 1.68548) |
0.90 |
completing_trace_staticLoop |
8.75142 us (± 1.03842) |
9.63622 us (± 1.71665) |
0.91 |
trace_fibonacci |
12.7676 us (± 1.41013) |
13.5531 us (± 2.25598) |
0.94 |
completing_trace_fibonacci |
6.62304 us (± 690.244) |
7.90938 us (± 2.93657) |
0.84 |
trace_gcd |
10.4655 us (± 1.4476) |
11.3036 us (± 2.07997) |
0.93 |
completing_trace_gcd |
4.42904 us (± 506.342) |
5.00729 us (± 882.318) |
0.88 |
trace_nestedIf10 |
55.9283 us (± 7.90516) |
59.154 us (± 10.3985) |
0.95 |
completing_trace_nestedIf10 |
55.9538 us (± 6.84908) |
59.6604 us (± 11.6553) |
0.94 |
trace_nestedIf100 |
1.78564 ms (± 41.0493) |
1.82749 ms (± 211.093) |
0.98 |
completing_trace_nestedIf100 |
1.81218 ms (± 17.9716) |
1.83584 ms (± 184.443) |
0.99 |
trace_chainedIf10 |
137.404 us (± 10.7987) |
141.329 us (± 14.1364) |
0.97 |
completing_trace_chainedIf10 |
70.8281 us (± 7.61094) |
75.6039 us (± 11.9562) |
0.94 |
trace_chainedIf100 |
5.16105 ms (± 48.8846) |
5.20817 ms (± 151.216) |
0.99 |
completing_trace_chainedIf100 |
2.85109 ms (± 32.3309) |
2.96615 ms (± 113.982) |
0.96 |
comp_mlir_add |
8.29847 ms (± 304.856) |
8.7298 ms (± 186.806) |
0.95 |
comp_mlir_ifThenElse |
8.83767 ms (± 192.437) |
9.52224 ms (± 554.446) |
0.93 |
comp_mlir_deeplyNestedIfElse |
7.81847 ms (± 385.493) |
8.25527 ms (± 419.822) |
0.95 |
comp_mlir_loop |
9.96584 ms (± 382.319) |
10.6419 ms (± 251.073) |
0.94 |
comp_mlir_ifInsideLoop |
31.9745 ms (± 626.95) |
33.1487 ms (± 263.286) |
0.96 |
comp_mlir_loopDirectCall |
14.6168 ms (± 280.246) |
15.5211 ms (± 268.299) |
0.94 |
comp_mlir_pointerLoop |
30.7214 ms (± 528.733) |
32.2398 ms (± 260.64) |
0.95 |
comp_mlir_staticLoop |
7.69017 ms (± 167.072) |
8.10823 ms (± 156.814) |
0.95 |
comp_mlir_fibonacci |
13.1282 ms (± 172.343) |
14.2322 ms (± 271.984) |
0.92 |
comp_mlir_gcd |
12.4343 ms (± 295.222) |
13.121 ms (± 269.554) |
0.95 |
comp_mlir_nestedIf10 |
13.2479 ms (± 278.781) |
14.0932 ms (± 268.431) |
0.94 |
comp_mlir_nestedIf100 |
27.4854 ms (± 239.925) |
28.7419 ms (± 453.804) |
0.96 |
comp_mlir_chainedIf10 |
12.3805 ms (± 332.377) |
12.9803 ms (± 243.113) |
0.95 |
comp_mlir_chainedIf100 |
23.127 ms (± 1.04154) |
24.4755 ms (± 215.14) |
0.94 |
comp_cpp_add |
24.9116 ms (± 693.489) |
||
comp_cpp_ifThenElse |
25.3445 ms (± 435.755) |
||
comp_cpp_deeplyNestedIfElse |
26.586 ms (± 592.64) |
||
comp_cpp_loop |
25.4753 ms (± 417.033) |
||
comp_cpp_ifInsideLoop |
26.5539 ms (± 613.797) |
||
comp_cpp_loopDirectCall |
25.8106 ms (± 523.211) |
||
comp_cpp_pointerLoop |
26.0284 ms (± 539.338) |
||
comp_cpp_staticLoop |
25.1583 ms (± 454.819) |
||
comp_cpp_fibonacci |
25.7165 ms (± 466.205) |
||
comp_cpp_gcd |
25.5392 ms (± 676.323) |
||
comp_cpp_nestedIf10 |
29.5343 ms (± 612.547) |
||
comp_cpp_nestedIf100 |
62.1611 ms (± 667.996) |
||
comp_cpp_chainedIf10 |
31.0521 ms (± 734.048) |
||
comp_cpp_chainedIf100 |
91.82 ms (± 1.27068) |
||
comp_bc_add |
14.6068 us (± 2.66166) |
||
comp_bc_ifThenElse |
17.7616 us (± 2.91562) |
||
comp_bc_deeplyNestedIfElse |
22.1052 us (± 3.56459) |
||
comp_bc_loop |
18.3934 us (± 3.21698) |
||
comp_bc_ifInsideLoop |
20.5073 us (± 3.00608) |
||
comp_bc_loopDirectCall |
18.7633 us (± 2.74779) |
||
comp_bc_pointerLoop |
19.8878 us (± 2.37651) |
||
comp_bc_staticLoop |
16.8842 us (± 3.89217) |
||
comp_bc_fibonacci |
18.3374 us (± 2.80027) |
||
comp_bc_gcd |
17.9666 us (± 2.81332) |
||
comp_bc_nestedIf10 |
34.8735 us (± 3.81846) |
||
comp_bc_nestedIf100 |
177.094 us (± 11.1456) |
||
comp_bc_chainedIf10 |
48.4883 us (± 6.2) |
||
comp_bc_chainedIf100 |
282.103 us (± 12.8435) |
||
comp_asmjit_add |
21.0823 us (± 4.14156) |
||
comp_asmjit_ifThenElse |
33.3408 us (± 4.71244) |
||
comp_asmjit_deeplyNestedIfElse |
57.7579 us (± 8.82259) |
||
comp_asmjit_loop |
35.218 us (± 4.98661) |
||
comp_asmjit_ifInsideLoop |
58.0365 us (± 10.5425) |
||
comp_asmjit_loopDirectCall |
47.095 us (± 10.7088) |
||
comp_asmjit_pointerLoop |
48.2895 us (± 7.85012) |
||
comp_asmjit_staticLoop |
27.9076 us (± 3.80376) |
||
comp_asmjit_fibonacci |
43.4024 us (± 7.99657) |
||
comp_asmjit_gcd |
34.8136 us (± 3.72297) |
||
comp_asmjit_nestedIf10 |
107.628 us (± 10.8575) |
||
comp_asmjit_nestedIf100 |
1.1314 ms (± 18.4949) |
||
comp_asmjit_chainedIf10 |
164.559 us (± 15.4997) |
||
comp_asmjit_chainedIf100 |
2.29047 ms (± 45.3975) |
||
ir_add |
838.478 ns (± 53.0126) |
941.34 ns (± 196.289) |
0.89 |
ir_ifThenElse |
2.46507 us (± 207.618) |
2.68536 us (± 395.882) |
0.92 |
ir_deeplyNestedIfElse |
6.47661 us (± 636.427) |
6.97561 us (± 1.01891) |
0.93 |
ir_loop |
2.92065 us (± 203.694) |
3.12539 us (± 474.086) |
0.93 |
ir_ifInsideLoop |
5.50702 us (± 410.273) |
6.02447 us (± 888.076) |
0.91 |
ir_loopDirectCall |
3.14196 us (± 237.018) |
3.36947 us (± 519.547) |
0.93 |
ir_pointerLoop |
3.72539 us (± 512.569) |
4.07891 us (± 562.593) |
0.91 |
ir_staticLoop |
2.23983 us (± 163.38) |
2.51948 us (± 368.522) |
0.89 |
ir_fibonacci |
3.15024 us (± 255.63) |
3.47394 us (± 610.161) |
0.91 |
ir_gcd |
2.62179 us (± 356.258) |
2.81634 us (± 432.104) |
0.93 |
ir_nestedIf10 |
15.6761 us (± 1.14105) |
16.9811 us (± 2.25068) |
0.92 |
ir_nestedIf100 |
188.241 us (± 8.15599) |
193.594 us (± 9.55543) |
0.97 |
ir_chainedIf10 |
28.5464 us (± 2.54051) |
29.9914 us (± 2.92488) |
0.95 |
ir_chainedIf100 |
365.271 us (± 14.6009) |
364.92 us (± 10.804) |
1.00 |
ssa_add |
186.197 ns (± 15.8129) |
209.202 ns (± 31.4313) |
0.89 |
ssa_ifThenElse |
462.902 ns (± 45.9996) |
496.015 ns (± 62.277) |
0.93 |
ssa_deeplyNestedIfElse |
1.20008 us (± 85.6029) |
1.24287 us (± 163.114) |
0.97 |
ssa_loop |
492.539 ns (± 30.8647) |
517.327 ns (± 70.5396) |
0.95 |
ssa_ifInsideLoop |
915.097 ns (± 60.9139) |
981.206 ns (± 134.06) |
0.93 |
ssa_loopDirectCall |
493.466 ns (± 33.1374) |
528.178 ns (± 64.8561) |
0.93 |
ssa_pointerLoop |
595.862 ns (± 27.1115) |
617.838 ns (± 71.4512) |
0.96 |
ssa_staticLoop |
431.836 ns (± 25.5933) |
434.379 ns (± 73.0346) |
0.99 |
ssa_fibonacci |
508.79 ns (± 26.9689) |
556.56 ns (± 93.6299) |
0.91 |
ssa_gcd |
466.194 ns (± 49.4904) |
493.534 ns (± 57.2873) |
0.94 |
This comment was automatically generated by workflow using github-action-benchmark.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Introduce load_invariant() and load_nonnull() APIs that flow through the tracing/IR pipeline and emit LLVM !invariant.load metadata via MLIR's LoadOp attributes. This enables LLVM optimizations like LICM for loads from known-immutable context structures in query compilation.