feat(ethexe/processor): instrument code lazily inside processing by playX18 · Pull Request #5396 · gear-tech/gear

playX18 · 2026-04-30T07:20:34Z

Resolves #4681

Note

Medium Risk
Adds on-the-fly WASM instrumentation and DB writes inside the hot queue-processing path, which could affect determinism/performance and introduces new failure mode when original code is absent.

Overview
Enables lazy code instrumentation during program execution: when processing a program queue (including overlay mode), if the current-runtime instrumented_code/code_metadata is missing in the DB, the processor now instruments the stored original_code via a runtime instance and persists the results.

This threads InstanceCreator into instrumented_code_and_metadata, exposes it on CommonRunContext, adds a new MissingOriginalCodeForProgram error, and includes a regression test ensuring processing populates missing instrumentation for valid code.

^{Reviewed by Cursor Bugbot for commit 5207bb5. Bugbot is set up for automated code reviews on this repo. Configure here.}

semanticdiff-com · 2026-04-30T07:20:37Z

Review changes with

Changed Files

File	Status
ethexe/processor/src/handling/overlaid.rs	75% smaller
ethexe/processor/src/handling/run/chunk_execution_spawn.rs	65% smaller
ethexe/processor/src/handling/run/mod.rs	47% smaller
ethexe/processor/src/lib.rs	0% smaller
ethexe/processor/src/tests.rs	0% smaller

gemini-code-assist · 2026-04-30T07:23:56Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a lazy instrumentation mechanism for program code within the processor. By deferring the instrumentation process until the code is actually required for execution, the system improves efficiency and robustness. The changes involve updating the run context to handle instrumentation dynamically and ensuring that missing instrumented code is generated on-the-fly from the original source.

Highlights

Lazy Instrumentation: Implemented lazy code instrumentation during the processing phase, ensuring that code is instrumented only when needed if it is missing from the database.
Error Handling: Added a new 'MissingOriginalCodeForProgram' error variant to handle cases where the original code is unavailable during the instrumentation process.
Testing: Added a new test case to verify that valid code is correctly instrumented when the current runtime instrumentation is missing.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request implements on-the-fly instrumentation for Gear programs within the ethexe processor. When instrumented code or metadata is missing from the database, the processor now fetches the original code, performs instrumentation using a Wasmtime instance, and persists the results back to the database. Feedback focuses on performance optimizations: specifically, noting that saving results to an overlaid database (common in RPC calls) may lead to redundant heavy instrumentation, and suggesting the reuse of Wasmtime instances across a processing pass to reduce instantiation overhead.

gemini-code-assist · 2026-04-30T07:30:01Z

+    db.set_instrumented_code(
+        ethexe_runtime_common::VERSION,
+        code_id,
+        instrumented_code.clone(),
+    );
+    db.set_code_metadata(code_id, code_metadata.clone());


When db is an overlaid database (e.g., during RPC execute_for_reply), these set calls only update the in-memory overlay. This means every RPC call for the same uninstrumented program will re-perform instrumentation. Since instrumentation is a computationally heavy operation, this might be a performance bottleneck for RPC. Ideally, deterministic instrumentation results should be cached in the base database, but the current Database abstraction makes this difficult without bypassing the overlay. Given this is a fallback path, it's acceptable, but worth noting for future optimization.

References

Code instrumentation is recognized as a computationally heavy 'main operation' in this repository, and its performance impact should be considered during implementation.

gemini-code-assist · 2026-04-30T07:30:01Z

+        .original_code(code_id)
+        .ok_or(ProcessorError::MissingOriginalCodeForProgram(code_id))?;
+
+    let mut instance = instance_creator.instantiate()?;


Instantiating a new Wasmtime instance for every uninstrumented code blob can be expensive. Given that code instrumentation is a heavy operation, this adds significant overhead. In spawn_chunk_execution, if a chunk contains multiple programs with uninstrumented code, this will happen multiple times. Although InstanceCreator::instantiate is optimized via InstancePre, you might consider instantiating the runtime once per chunk processing pass and reusing it for all required instrumentations within that pass to reduce overhead.

References

Code instrumentation is recognized as a computationally heavy 'main operation' in this repository, and its performance impact should be considered during implementation.

playX18 · 2026-04-30T07:36:56Z

@cursor review

playX18 · 2026-04-30T07:37:07Z

@codex review

playX18 · 2026-04-30T07:37:11Z

@claude review

claude · 2026-04-30T07:37:34Z

Claude finished @playX18's task in 1m 49s —— View job

Code Review

Reading changed files
Analyzing code
Posting review

Summary: This PR adds a lazy fallback in instrumented_code_and_metadata so that if the instrumented code / metadata are absent from the DB (e.g. after a runtime version bump), the processor instruments the stored original code on-the-fly and persists the result. The change is well-targeted, the logic is correct, and the instance_creator field visibility bump is minimal. A few points worth discussing:

medium: misleading error when instrument() returns None

File: ethexe/processor/src/handling/run/mod.rs lines 444–446

let (instrumented_code, code_metadata) = instance
    .instrument(&original_code)?
    .ok_or(ProcessorError::MissingInstrumentedCodeForProgram(code_id))?;

instrument() returns Ok(None) when the WASM can't be instrumented (invalid code). Mapping that to MissingInstrumentedCodeForProgram conflates "nothing in the DB" with "instrumentation produced no output." A dedicated variant like InstrumentationFailed(CodeId) would make debugging easier — especially since MissingInstrumentedCodeForProgram was previously only reachable if the DB lookup returned None, not if live instrumentation silently failed.

That said, this path is guarded upstream: handle_router_event rejects programs whose code_id is not marked code_valid, so in practice Ok(None) here would be a programmer error rather than a runtime condition.

Fix this →

low: RPC re-instrumentation on every call

File: ethexe/processor/src/handling/run/mod.rs lines 448–453 (also overlaid.rs:188)

In OverlaidRunContext, db is an overlaid (in-memory) database. set_instrumented_code / set_code_metadata writes go to the overlay only — they're discarded after the RPC call. So every execute_for_reply call for a program with missing instrumented code pays the full instrumentation cost. Since InstanceCreator::instantiate is InstancePre-backed it's not unbounded, but it's still non-trivial.

This is acceptable for an uncommon fallback path (missing instrumented code after a runtime upgrade should be temporary), but worth a comment so future readers don't wonder why writes appear to be no-ops for RPC callers.

low: partial-state non-issue but worth noting

If set_instrumented_code succeeds and set_code_metadata subsequently panics, the DB holds instrumented code but no metadata. The early-exit check on line 433–437 requires both to be present, so the next call retries instrumentation and overwrites — this is idempotent and safe. No action needed, just confirming this was considered.

low: test only covers the happy path

File: ethexe/processor/src/tests.rs lines 319–377

The new test verifies that after process_queues, both instrumented_code and code_metadata are populated. Consider also testing:

MissingOriginalCodeForProgram: drop the set_original_code call — ensures the new error variant is reachable and correct.
The overlaid path: confirm that execute_for_reply doesn't return an error when instrumented code is absent (even if it doesn't persist).

nit: CodesStorageRW import placement

CodesStorageRW is only used in the fallback branch of instrumented_code_and_metadata. It's a minor point, but the trait import at the top of the file is now always in scope even when the fallback never fires — fine as-is, just noting it.

Overall this is a clean fix. The logic is sound given that handle_router_event already guards against invalid code_ids being registered. Main open question is whether the instrument() → None case deserves a distinct error variant for observability.

chatgpt-codex-connector · 2026-04-30T07:42:07Z

Codex Review: Didn't find any major issues. Can't wait for the next one!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

cursor

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

^{Reviewed by Cursor Bugbot for commit 5207bb5. Configure here.}

claude · 2026-05-04T01:15:09Z

Claude finished @playX18's task in 2m 34s —— View job

Code Review

Reading changed files
Analyzing code
Posting review

Summary: This PR extracts instrumented_code_and_metadata into a standalone function and adds a lazy fallback: when instrumented code / metadata is absent from the DB (e.g., after a runtime version bump), the processor instruments the original code on-the-fly and persists the result. The scope is narrow and the logic is sound for the happy path. A few issues worth addressing before merge:

medium: violates the documented "no KV writes from processor" invariant

File: ethexe/processor/src/lib.rs lines 151–153

The crate-level doc currently reads:

// - Processor is designed to write only in CAS, it must NEVER modify
//   key-value storage from Database.

Both set_instrumented_code and set_code_metadata are KV (not CAS) writes, and they're now called from within instrumented_code_and_metadata which is invoked directly from the queue-processing hot path. The safety argument (instrumentation is deterministic, writes are idempotent, all validators arrive at the same bytes) is reasonable, but the invariant should be updated to document the exception — otherwise the next reader will either assume this is a bug or cargo-cult the old constraint.

Fix this →

medium: instrument() → None aliased to the wrong error variant

File: ethexe/processor/src/handling/run/mod.rs lines 444–446

let (instrumented_code, code_metadata) = instance
    .instrument(&original_code)?
    .ok_or(ProcessorError::MissingInstrumentedCodeForProgram(code_id))?;

instrument() returns Ok(None) when the WASM is structurally invalid. Mapping that to MissingInstrumentedCodeForProgram conflates two distinct situations: "the cache is empty" (DB lookup returned None) versus "live instrumentation produced no output." A dedicated variant like InstrumentationFailed(CodeId) would make the distinction clear in logs and error propagation.

In practice this path is guarded upstream — handle_router_event only registers programs whose code_id is marked code_valid, so Ok(None) here would signal a serious internal inconsistency rather than a routine condition. All the more reason the error name should reflect that.

Fix this →

low: overlaid DB writes are ephemeral — re-instruments on every RPC call

File: ethexe/processor/src/handling/overlaid.rs line 188

OverlaidRunContext holds an overlaid (in-memory, copy-on-write) database. The set_instrumented_code / set_code_metadata writes inside instrumented_code_and_metadata go to the overlay and are discarded when the overlay is dropped. This means every execute_for_reply call for a program whose instrumentation is missing will pay the full instance_creator.instantiate() + instrument() cost. This is an edge case (missing instrumented code is expected only briefly after a version bump), but a short comment noting the limitation would help future readers.

low: parallel chunk execution may redundantly instrument the same code_id

File: ethexe/processor/src/handling/run/mod.rs (spawn_chunk_execution path)

Multiple programs in a chunk may share the same code_id. If all of them miss instrumented code in the DB (e.g. immediately after a runtime version bump), each parallel worker will independently call instantiate() + instrument() for the same bytes and race to write to the DB. The writes are idempotent (identical content → identical output) and the DB layer handles concurrent writes safely, so correctness is not at risk. But it could cause noticeable latency in a chunk where every program's code is missing — worth a comment.

low: new test only covers the happy path

File: ethexe/processor/src/tests.rs lines 321–373

The test verifies that after process_queues with missing instrumented code, both instrumented_code and code_metadata are populated — good. Two additional cases would complete coverage of the new code:

MissingOriginalCodeForProgram path: skip the db.set_original_code(&code) call and assert that process_queues returns Err(ProcessorError::MissingOriginalCodeForProgram(_)).
Overlaid path: call execute_for_reply with instrumented code absent from the overlay and verify it succeeds (exercises the overlay write-then-discard behavior without panic).

Overall this is a clean, well-targeted fix. The main asks are: update the broken invariant comment in the crate doc, and rename the error produced when instrument() returns None.

…spawn_chunk_execution.

feat(ethexe/processor): instrument code lazily inside processing

a0090db

playX18 requested review from StackOverflowExcept1on, breathx, ecol-master and grishasobol April 30, 2026 07:20

playX18 self-assigned this Apr 30, 2026

playX18 added A0-pleasereview labels Apr 30, 2026

Merge branch 'master' into ap/process-instrument-lazy

d10976c

gemini-code-assist Bot reviewed Apr 30, 2026

View reviewed changes

clippy

5207bb5

cursor Bot reviewed Apr 30, 2026

View reviewed changes

Merge branch 'master' into ap/process-instrument-lazy

c49cabc

playX18 added 2 commits May 4, 2026 10:09

feat(ethexe/processor): only create one instrumentation instance per …

ef6a8c0

…spawn_chunk_execution.

remove InstrumentationFailed case

5b0107e

breathx removed A0-pleasereview labels May 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ethexe/processor): instrument code lazily inside processing#5396

feat(ethexe/processor): instrument code lazily inside processing#5396
playX18 wants to merge 6 commits intomasterfrom
ap/process-instrument-lazy

playX18 commented Apr 30, 2026 •

edited by cursor Bot

Loading

Uh oh!

semanticdiff-com Bot commented Apr 30, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Apr 30, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 30, 2026

Uh oh!

gemini-code-assist Bot Apr 30, 2026

Uh oh!

playX18 commented Apr 30, 2026

Uh oh!

playX18 commented Apr 30, 2026

Uh oh!

playX18 commented Apr 30, 2026

Uh oh!

claude Bot commented Apr 30, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot commented Apr 30, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

claude Bot commented May 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

playX18 commented Apr 30, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

semanticdiff-com Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot commented Apr 30, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

playX18 commented Apr 30, 2026

Uh oh!

playX18 commented Apr 30, 2026

Uh oh!

playX18 commented Apr 30, 2026

Uh oh!

claude Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review

Uh oh!

chatgpt-codex-connector Bot commented Apr 30, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

claude Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

playX18 commented Apr 30, 2026 •

edited by cursor Bot

Loading

semanticdiff-com Bot commented Apr 30, 2026 •

edited

Loading

claude Bot commented Apr 30, 2026 •

edited

Loading

claude Bot commented May 4, 2026 •

edited

Loading