Skip to content

Add annotated load operations with LLVM metadata hints for val_ptr#208

Open
PhilippGrulich wants to merge 1 commit intomainfrom
claude/add-annotated-load-hints
Open

Add annotated load operations with LLVM metadata hints for val_ptr#208
PhilippGrulich wants to merge 1 commit intomainfrom
claude/add-annotated-load-hints

Conversation

@PhilippGrulich
Copy link
Copy Markdown
Member

Introduce load_invariant() and load_nonnull() APIs that flow through the tracing/IR pipeline and emit LLVM !invariant.load metadata via MLIR's LoadOp attributes. This enables LLVM optimizations like LICM for loads from known-immutable context structures in query compilation.

  • Add LOAD_INVARIANT and LOAD_NONNULL trace operations
  • Extend LoadOperation with LoadHints (flags + range/deref fields)
  • Wire through TraceToIRConversionPhase with processHintedLoad()
  • Set mlir::LLVM::LoadOp invariant attribute in MLIR backend
  • Non-MLIR backends transparently ignore hints

Introduce load_invariant() and load_nonnull() APIs that flow through
the tracing/IR pipeline and emit LLVM !invariant.load metadata via
MLIR's LoadOp attributes. This enables LLVM optimizations like LICM
for loads from known-immutable context structures in query compilation.

- Add LOAD_INVARIANT and LOAD_NONNULL trace operations
- Extend LoadOperation with LoadHints (flags + range/deref fields)
- Wire through TraceToIRConversionPhase with processHintedLoad()
- Set mlir::LLVM::LoadOp invariant attribute in MLIR backend
- Non-MLIR backends transparently ignore hints

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracing Benchmark

Details
Benchmark suite Current: d33ffa1 Previous: f51ebe3 Ratio
exec_mlir_add 3.62919 ns (± 0.710418) 3.4399 ns (± 0.256188) 1.06
exec_mlir_fibonacci 4.98613 us (± 907.598) 4.85884 us (± 505.209) 1.03
exec_mlir_sum 523.265 us (± 17.9864) 757.372 us (± 21.5287) 0.69
exec_cpp_add 5.41222 ns (± 0.787407) 5.42716 ns (± 0.643648) 1.00
exec_cpp_fibonacci 94.8604 us (± 8.94938) 94.9613 us (± 7.72533) 1.00
exec_cpp_sum 35.9968 ms (± 160.543) 36.1061 ms (± 88.2329) 1.00
exec_bc_add 42.1606 ns (± 0.299163) 46.6504 ns (± 7.90573) 0.90
exec_bc_fibonacci 928.594 us (± 8.09813) 818.424 us (± 13.2887) 1.13
exec_bc_sum 199.746 ms (± 399.43) 176.076 ms (± 1.29583) 1.13
exec_asmjit_add 3.46415 ns (± 0.49568) 3.59536 ns (± 0.444299) 0.96
exec_asmjit_fibonacci 21.5924 us (± 2.6543) 21.522 us (± 2.09311) 1.00
exec_asmjit_sum 4.45694 ms (± 25.5995) 4.58218 ms (± 49.4414) 0.97
trace_add 2.4397 us (± 166.894) 3.16624 us (± 902.417) 0.77
completing_trace_add 2.44949 us (± 204.523) 2.73966 us (± 452.086) 0.89
trace_ifThenElse 11.5819 us (± 2.21797) 12.0375 us (± 2.09207) 0.96
completing_trace_ifThenElse 5.19741 us (± 622.555) 5.70137 us (± 1.00096) 0.91
trace_deeplyNestedIfElse 34.5569 us (± 3.96859) 36.9404 us (± 7.64993) 0.94
completing_trace_deeplyNestedIfElse 15.1995 us (± 1.95764) 17.1665 us (± 3.58801) 0.89
trace_loop 11.2286 us (± 1.61033) 11.9224 us (± 1.95407) 0.94
completing_trace_loop 5.30012 us (± 873.158) 5.64838 us (± 876.29) 0.94
trace_ifInsideLoop 22.1906 us (± 3.1822) 23.4585 us (± 3.68738) 0.95
completing_trace_ifInsideLoop 9.75766 us (± 1.32196) 10.8813 us (± 2.06765) 0.90
trace_loopDirectCall 11.3486 us (± 1.83005) 12.0761 us (± 2.07061) 0.94
completing_trace_loopDirectCall 5.42825 us (± 930.228) 5.74402 us (± 967.082) 0.95
trace_pointerLoop 17.2482 us (± 3.15697) 18.4038 us (± 3.58652) 0.94
completing_trace_pointerLoop 11.1593 us (± 1.48861) 12.2491 us (± 2.2413) 0.91
trace_staticLoop 9.09606 us (± 1.04045) 10.0716 us (± 1.68548) 0.90
completing_trace_staticLoop 8.75142 us (± 1.03842) 9.63622 us (± 1.71665) 0.91
trace_fibonacci 12.7676 us (± 1.41013) 13.5531 us (± 2.25598) 0.94
completing_trace_fibonacci 6.62304 us (± 690.244) 7.90938 us (± 2.93657) 0.84
trace_gcd 10.4655 us (± 1.4476) 11.3036 us (± 2.07997) 0.93
completing_trace_gcd 4.42904 us (± 506.342) 5.00729 us (± 882.318) 0.88
trace_nestedIf10 55.9283 us (± 7.90516) 59.154 us (± 10.3985) 0.95
completing_trace_nestedIf10 55.9538 us (± 6.84908) 59.6604 us (± 11.6553) 0.94
trace_nestedIf100 1.78564 ms (± 41.0493) 1.82749 ms (± 211.093) 0.98
completing_trace_nestedIf100 1.81218 ms (± 17.9716) 1.83584 ms (± 184.443) 0.99
trace_chainedIf10 137.404 us (± 10.7987) 141.329 us (± 14.1364) 0.97
completing_trace_chainedIf10 70.8281 us (± 7.61094) 75.6039 us (± 11.9562) 0.94
trace_chainedIf100 5.16105 ms (± 48.8846) 5.20817 ms (± 151.216) 0.99
completing_trace_chainedIf100 2.85109 ms (± 32.3309) 2.96615 ms (± 113.982) 0.96
comp_mlir_add 8.29847 ms (± 304.856) 8.7298 ms (± 186.806) 0.95
comp_mlir_ifThenElse 8.83767 ms (± 192.437) 9.52224 ms (± 554.446) 0.93
comp_mlir_deeplyNestedIfElse 7.81847 ms (± 385.493) 8.25527 ms (± 419.822) 0.95
comp_mlir_loop 9.96584 ms (± 382.319) 10.6419 ms (± 251.073) 0.94
comp_mlir_ifInsideLoop 31.9745 ms (± 626.95) 33.1487 ms (± 263.286) 0.96
comp_mlir_loopDirectCall 14.6168 ms (± 280.246) 15.5211 ms (± 268.299) 0.94
comp_mlir_pointerLoop 30.7214 ms (± 528.733) 32.2398 ms (± 260.64) 0.95
comp_mlir_staticLoop 7.69017 ms (± 167.072) 8.10823 ms (± 156.814) 0.95
comp_mlir_fibonacci 13.1282 ms (± 172.343) 14.2322 ms (± 271.984) 0.92
comp_mlir_gcd 12.4343 ms (± 295.222) 13.121 ms (± 269.554) 0.95
comp_mlir_nestedIf10 13.2479 ms (± 278.781) 14.0932 ms (± 268.431) 0.94
comp_mlir_nestedIf100 27.4854 ms (± 239.925) 28.7419 ms (± 453.804) 0.96
comp_mlir_chainedIf10 12.3805 ms (± 332.377) 12.9803 ms (± 243.113) 0.95
comp_mlir_chainedIf100 23.127 ms (± 1.04154) 24.4755 ms (± 215.14) 0.94
comp_cpp_add 24.9116 ms (± 693.489)
comp_cpp_ifThenElse 25.3445 ms (± 435.755)
comp_cpp_deeplyNestedIfElse 26.586 ms (± 592.64)
comp_cpp_loop 25.4753 ms (± 417.033)
comp_cpp_ifInsideLoop 26.5539 ms (± 613.797)
comp_cpp_loopDirectCall 25.8106 ms (± 523.211)
comp_cpp_pointerLoop 26.0284 ms (± 539.338)
comp_cpp_staticLoop 25.1583 ms (± 454.819)
comp_cpp_fibonacci 25.7165 ms (± 466.205)
comp_cpp_gcd 25.5392 ms (± 676.323)
comp_cpp_nestedIf10 29.5343 ms (± 612.547)
comp_cpp_nestedIf100 62.1611 ms (± 667.996)
comp_cpp_chainedIf10 31.0521 ms (± 734.048)
comp_cpp_chainedIf100 91.82 ms (± 1.27068)
comp_bc_add 14.6068 us (± 2.66166)
comp_bc_ifThenElse 17.7616 us (± 2.91562)
comp_bc_deeplyNestedIfElse 22.1052 us (± 3.56459)
comp_bc_loop 18.3934 us (± 3.21698)
comp_bc_ifInsideLoop 20.5073 us (± 3.00608)
comp_bc_loopDirectCall 18.7633 us (± 2.74779)
comp_bc_pointerLoop 19.8878 us (± 2.37651)
comp_bc_staticLoop 16.8842 us (± 3.89217)
comp_bc_fibonacci 18.3374 us (± 2.80027)
comp_bc_gcd 17.9666 us (± 2.81332)
comp_bc_nestedIf10 34.8735 us (± 3.81846)
comp_bc_nestedIf100 177.094 us (± 11.1456)
comp_bc_chainedIf10 48.4883 us (± 6.2)
comp_bc_chainedIf100 282.103 us (± 12.8435)
comp_asmjit_add 21.0823 us (± 4.14156)
comp_asmjit_ifThenElse 33.3408 us (± 4.71244)
comp_asmjit_deeplyNestedIfElse 57.7579 us (± 8.82259)
comp_asmjit_loop 35.218 us (± 4.98661)
comp_asmjit_ifInsideLoop 58.0365 us (± 10.5425)
comp_asmjit_loopDirectCall 47.095 us (± 10.7088)
comp_asmjit_pointerLoop 48.2895 us (± 7.85012)
comp_asmjit_staticLoop 27.9076 us (± 3.80376)
comp_asmjit_fibonacci 43.4024 us (± 7.99657)
comp_asmjit_gcd 34.8136 us (± 3.72297)
comp_asmjit_nestedIf10 107.628 us (± 10.8575)
comp_asmjit_nestedIf100 1.1314 ms (± 18.4949)
comp_asmjit_chainedIf10 164.559 us (± 15.4997)
comp_asmjit_chainedIf100 2.29047 ms (± 45.3975)
ir_add 838.478 ns (± 53.0126) 941.34 ns (± 196.289) 0.89
ir_ifThenElse 2.46507 us (± 207.618) 2.68536 us (± 395.882) 0.92
ir_deeplyNestedIfElse 6.47661 us (± 636.427) 6.97561 us (± 1.01891) 0.93
ir_loop 2.92065 us (± 203.694) 3.12539 us (± 474.086) 0.93
ir_ifInsideLoop 5.50702 us (± 410.273) 6.02447 us (± 888.076) 0.91
ir_loopDirectCall 3.14196 us (± 237.018) 3.36947 us (± 519.547) 0.93
ir_pointerLoop 3.72539 us (± 512.569) 4.07891 us (± 562.593) 0.91
ir_staticLoop 2.23983 us (± 163.38) 2.51948 us (± 368.522) 0.89
ir_fibonacci 3.15024 us (± 255.63) 3.47394 us (± 610.161) 0.91
ir_gcd 2.62179 us (± 356.258) 2.81634 us (± 432.104) 0.93
ir_nestedIf10 15.6761 us (± 1.14105) 16.9811 us (± 2.25068) 0.92
ir_nestedIf100 188.241 us (± 8.15599) 193.594 us (± 9.55543) 0.97
ir_chainedIf10 28.5464 us (± 2.54051) 29.9914 us (± 2.92488) 0.95
ir_chainedIf100 365.271 us (± 14.6009) 364.92 us (± 10.804) 1.00
ssa_add 186.197 ns (± 15.8129) 209.202 ns (± 31.4313) 0.89
ssa_ifThenElse 462.902 ns (± 45.9996) 496.015 ns (± 62.277) 0.93
ssa_deeplyNestedIfElse 1.20008 us (± 85.6029) 1.24287 us (± 163.114) 0.97
ssa_loop 492.539 ns (± 30.8647) 517.327 ns (± 70.5396) 0.95
ssa_ifInsideLoop 915.097 ns (± 60.9139) 981.206 ns (± 134.06) 0.93
ssa_loopDirectCall 493.466 ns (± 33.1374) 528.178 ns (± 64.8561) 0.93
ssa_pointerLoop 595.862 ns (± 27.1115) 617.838 ns (± 71.4512) 0.96
ssa_staticLoop 431.836 ns (± 25.5933) 434.379 ns (± 73.0346) 0.99
ssa_fibonacci 508.79 ns (± 26.9689) 556.56 ns (± 93.6299) 0.91
ssa_gcd 466.194 ns (± 49.4904) 493.534 ns (± 57.2873) 0.94

This comment was automatically generated by workflow using github-action-benchmark.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant