Skip to content

feat: zero-config Java projects + smart ReplayHelper for end-to-end optimization#1880

Merged
misrasaurabh1 merged 30 commits intomainfrom
java-config-redesign
Apr 1, 2026
Merged

feat: zero-config Java projects + smart ReplayHelper for end-to-end optimization#1880
misrasaurabh1 merged 30 commits intomainfrom
java-config-redesign

Conversation

@misrasaurabh1
Copy link
Copy Markdown
Contributor

@misrasaurabh1 misrasaurabh1 commented Mar 20, 2026

Summary

Eliminates codeflash.toml for Java projects and fixes the complete trace → optimize pipeline to work end-to-end on real Java projects (validated on aerospike-client-java).

Zero-config Java support

  • Auto-detect Java projects from pom.xml / build.gradle — no config file needed
  • Read custom settings from pom.xml <properties> or gradle.properties (codeflash.* keys)
  • Multi-module Maven scanning: parses each module's <sourceDirectory> / <testSourceDirectory>, picks module with most Java files as source root
  • Deleted all codeflash.toml files

Smart ReplayHelper (behavior + performance parity)

  • ReplayHelper.replay() now reads CODEFLASH_MODE env var and produces the same output as existing test instrumentation
  • Behavior mode: captures return value via Kryo, writes to SQLite test_results table for correctness comparison
  • Performance mode: runs inner loop for JIT warmup, prints timing markers matching the optimizer's expected format
  • No mode: just invokes (trace-only or manual testing)

Bug fixes

  • JFR parser: normalize /. in class names (JVM internal format vs Java package format)
  • Graceful timeout: send SIGTERM before SIGKILL so JFR can dump recording and shutdown hooks run
  • TracingTransformer: remove isRecording() check that prevented instrumenting classes loaded during serialization (was causing 3 captures instead of 10,000+)
  • Replay test generator: JUnit 4 support (org.junit.Test vs org.junit.jupiter.api.Test), detect from project build config
  • Overloaded methods: global counter per method name to avoid duplicate replay test method names
  • Instrumentation: fix _add_behavior_instrumentation for compact @Test lines (annotation + signature on same line)
  • project_root: use build root directory (not sub-module) for multi-module Maven projects
  • optimize subparser: add_help=False so -h in Java commands isn't intercepted as --help

Validated end-to-end on aerospike-client-java

  • 10,500+ invocations traced across 282 methods
  • 41 functions ranked by JFR CPU profiling data
  • 55 replay test files generated (JUnit 4 compatible)
  • Replay tests compile, run, and pass (129 tests for Crypto.computeDigest)
  • Behavior baseline established with timing data (4.81ms over 119 loops)
  • Candidates correctly verified and rejected when behavior doesn't match

Test plan

  • 33 config detection tests (build tool, source/test root, Maven/Gradle properties, multi-module)
  • 13 JFR parser tests (normalization, filtering, ranking, timeout, project_root)
  • 10 replay test generation tests (JUnit 4/5, overloads, instrumentation)
  • 8 tracer e2e tests (agent capture, replay generation, orchestration)
  • 6 integration tests (full pipeline: discover → rank → compile)
  • 2 replay test discovery tests
  • Full optimizer pipeline on aerospike benchmark: trace → discover → rank → optimize → verify

🤖 Generated with Claude Code

misrasaurabh1 and others added 7 commits March 19, 2026 19:11
…iles

Java projects no longer need a standalone config file. Codeflash reads
config from pom.xml <properties> or gradle.properties, and auto-detects
source/test roots from build tool conventions.

Changes:
- Add parse_java_project_config() to read codeflash.* properties from
  pom.xml and gradle.properties
- Add multi-module Maven scanning: parses each module's pom.xml for
  <sourceDirectory> and <testSourceDirectory>, picks module with most
  Java files as source root, identifies test modules by name
- Route Java projects through build-file detection in config_parser.py
  before falling back to pyproject.toml
- Detect Java language from pom.xml/build.gradle presence (no config needed)
- Fix project_root for multi-module projects (was resolving to sub-module)
- Fix JFR parser / separators (JVM uses com/example, normalized to com.example)
- Fix graceful timeout (SIGTERM before SIGKILL for JFR dump + shutdown hooks)
- Remove isRecording() check from TracingTransformer (was preventing class
  instrumentation for classes loaded during serialization)
- Delete all codeflash.toml files from fixtures and code_to_optimize
- Add 33 config detection tests
- Update docs for zero-config Java setup

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replay tests call helper.replay() via reflection, not the target function
directly. The behavior instrumentation can't wrap indirect calls and
produces malformed output (code emitted outside class body) for large
replay test files. For replay tests, just rename the class without
adding instrumentation — JUnit pass/fail results verify correctness.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Detect test framework from project build config and generate replay
tests with appropriate imports (org.junit.Test for JUnit 4,
org.junit.jupiter.api.Test for JUnit 5). Fixes compilation failures
on projects using JUnit 4 (like aerospike-client-java).

Also passes test_framework through run_java_tracer to
generate_replay_tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ay tests

Use a global counter per method name across all descriptors to generate
unique test method names. Previously, overloaded methods (same name,
different descriptor) would generate duplicate replay_methodName_N
methods, causing compilation errors.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…on skip

10 new tests covering:
- JUnit 5 replay test generation (imports, class visibility)
- JUnit 4 replay test generation (imports, public methods, @afterclass)
- Overloaded method handling (no duplicate test method names)
- Instrumentation skip for replay tests (behavior + perf mode)
- Regular tests still get instrumented normally

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…solution

13 new tests covering:
- JFR class name normalization (/ to . conversion)
- Package-based sample filtering
- Addressable time calculation from JFR samples
- Method ranking order and format
- Graceful timeout (SIGTERM before SIGKILL)
- Multi-module project root detection (Path not str)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The behavior instrumentation was producing malformed output for compact
@test lines (annotation + method signature on same line, common in
replay tests). The method signature collection loop would skip past the
opening brace and consume subsequent methods' content.

Fix: detect when the @test annotation line already contains { and treat
it as both annotation and method signature, avoiding the separate
signature search that was over-consuming lines.

Reverted the instrumentation skip for replay tests — they now get
properly instrumented for both behavior capture and performance timing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@claude
Copy link
Copy Markdown
Contributor

claude bot commented Mar 20, 2026

Claude finished @HeshamHM28's task in 4s —— View job


PR Review Summary

Prek Checks

prek run --from-ref origin/main — clean, no issues.

Mypy reports 19 errors across models.py and discovery/functions_to_optimize.py, but these are all pre-existing (the PR only changes one line in models.py: runtime is not None). Not introduced by this PR.

Code Review

Binary file committed

codeflash/languages/java/resources/codeflash-runtime-1.0.0.jar is a binary file that changed in this PR (15.97 MB → 15.98 MB). This is expected since ReplayHelper.java and the tracer were updated, but binary files in git make diffs unreadable and repo size grows permanently. Consider whether this JAR should be built from source during CI (via mvn package) and excluded from source control via .gitignore, or if committing it is an intentional design choice for simplicity.

getCallingTestClassName() heuristic is fragile

ReplayHelper.java:getCallingTestClassName() walks the stack looking for a class name containing "ReplayTest" or "replay". Generated test classes are named ReplayTest_<classname> so this works for the happy path. However:

  • The fallback (stack[3].getClassName()) could return a JUnit framework internal class (org.junit.runners.*) rather than the test class when JUnit uses reflection-based invocation
  • For manually written tests that call helper.replay() from a class not matching these strings, the behavior mode will associate results with the wrong class name, causing comparison failures

A more robust approach would be to pass the test class name explicitly to replay() — but that would change the generated test format. At minimum, document this constraint.

Windows: SIGTERM not graceful

_run_java_with_graceful_timeout() in codeflash/languages/java/tracer.py:41 calls proc.send_signal(signal.SIGTERM). On Windows, this calls TerminateProcess() — the same as proc.kill(), so JFR can't dump its recording. The subsequent proc.kill() in the except handler is then a no-op on an already-dead process. Low severity (JFR/Java profiling is primarily Linux/macOS), but the code doesn't log a warning or handle the Windows case.

detect_java_project called twice at trace time

In codeflash/tracer.py:388-392, detect_java_project(project_root) is called to get test_framework. This is a redundant detection since the same config was already read during process_pyproject_config. If JavaProjectConfig.test_framework were propagated into the argparse Namespace during setup (similar to how tests_root and module_root are), this extra filesystem scan could be avoided. Minor, but worth a follow-up. Fix this →

replayBehavior invocation ID format

In ReplayHelper.java:replayBehavior(), invId = testMethodName + "_" + testIteration. The comment says testIteration goes at the END so the comparator's lastUnderscore stripping can make baseline/candidate keys match. However, testMethodName itself could already contain underscores (e.g., replay_computeDigest_42). If the comparator strips only the last underscore segment, this should be fine — but this is subtle coupling between the Java runtime and the Python comparator that is not documented anywhere. Adding a comment linking to the Python-side comparator logic would help future maintainers.

Duplicate Detection

No duplicates detected. The new functions (_run_java_with_graceful_timeout, generate_replay_tests JUnit4 support, getCallingTestClassName/MethodName) are unique to their modules.

Test Coverage

All 23 new tests pass (test_jfr_parser.py: 13 tests, test_replay_test_generation.py: 10 tests).

File Coverage
jfr_parser.py 71%
replay_test.py 76%
instrumentation.py 61%
tracer.py (java) 33%
maven_strategy.py 9% (largely pre-existing)

tracer.py at 33% is low but most uncovered code is subprocess orchestration (_run_java_with_graceful_timeout, JavaTracer.trace()) that is inherently difficult to unit test. The graceful timeout logic has dedicated tests in TestGracefulTimeout. Acceptable for now.


Last updated: 2026-04-01T20:13Z

ReplayHelper now reads CODEFLASH_MODE env var and produces the same
output as the existing test instrumentation:

- Behavior mode: captures return value via Kryo serialization, writes
  to SQLite (test_results table) for correctness comparison, prints
  start/end timing markers
- Performance mode: runs inner loop for JIT warmup, prints timing
  markers for each iteration matching the expected format
- No mode: just invokes the method (trace-only or manual testing)

This achieves feature parity with the existing test instrumentation
for replay tests, which call functions via reflection and can't be
wrapped by text-level instrumentation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@misrasaurabh1 misrasaurabh1 changed the title Java config redesign + bugfixs for Tracer feat: zero-config Java projects + smart ReplayHelper for end-to-end optimization Mar 20, 2026
…ay tests + speedups

- Trigger on any codeflash/** or tests/** changes (not just java subset)
- Validate replay test files are discovered per-function
- Already validates: replay test generation, global discovery count,
  optimization success, and minimum speedup percentage

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added the workflow-modified This PR modifies GitHub Actions workflows label Mar 20, 2026
misrasaurabh1 and others added 6 commits March 19, 2026 22:40
The refactored Java project_root handling moved args.tests_root
resolution after the project_root_from_module_root call, which passed
a string instead of a Path. Restore the original order: resolve
tests_root to Path first, then set test_project_root, then override
both for Java multi-module projects.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Use Path comparisons instead of forward-slash substring matching
- Avoid parse_args() in test (reads stdin on Windows) — use Namespace directly

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use print(flush=True) instead of logging.info for subprocess output so
CI logs show progress in real-time instead of buffering until completion.
Also set PYTHONUNBUFFERED=1 for the subprocess.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…_write_gradle_properties

Co-authored-by: Saurabh Misra <undefined@users.noreply.github.com>
…ions harder

- Set jdk.ExecutionSample#period=1ms (default was 10ms) so JFR captures
  samples from shorter-running programs
- Workload.main now runs 1000 rounds with larger inputs so JFR can
  capture method-level CPU samples (repeatString with O(n²) concat
  dominates ~75% of samples)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ove priority

Replace xml.etree.ElementTree with text-based regex manipulation in
_write_maven_properties() and _remove_java_build_config(). ElementTree
destroys XML comments, mangles namespace declarations (ns0: prefixes),
and reformats whitespace. The new approach reads/writes pom.xml as plain
text, only touching codeflash.* property lines.

Also extracts duplicated key_map to shared _MAVEN_KEY_MAP constant and
aligns remove priority to check pom.xml first (matching write order).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@mashraf-222
Copy link
Copy Markdown
Contributor

Review & fix: pom.xml formatting preservation (3c63b60)

Reviewed the PR changes and identified one genuine issue in the config writer.

What was fixed

_write_maven_properties() and _remove_java_build_config() destroy pom.xml formatting — both used xml.etree.ElementTree which strips XML comments, mangles namespace declarations (xmlnsns0: prefixes), and reformats all whitespace. Replaced with text-based regex manipulation that only touches codeflash.* property lines and preserves everything else. The new approach also detects existing indentation from sibling elements so inserted properties match the file's style.

Additionally:

  • Extracted duplicated key_map dict to shared _MAVEN_KEY_MAP module constant
  • Aligned _remove_java_build_config() to check pom.xml first, matching write priority
  • Added 9 tests (7 for config_writer, 2 for detector confirming zero-config intent)

What was reviewed and left as-is

Confirmed the following are intentional design decisions, not bugs:

  • has_existing_config() returning True for any Java build file — this is the zero-config mechanism that skips the first-run setup flow
  • isRecording() guard removal in TracingTransformer — PR description explains this fixed captures going from 3 to 10,000+
  • add_help=False on optimize subparser — prevents argparse from intercepting -h meant for Java command passthrough

Validation

  • All 152 setup tests pass (including 9 new)
  • All 8 tracer E2E tests pass
  • Full Fibonacci E2E optimization: 279x speedup found, mark-as-success 200
  • prek clean (ruff check + format)

mashraf-222 and others added 3 commits March 24, 2026 16:50
…os (TODO-37)

Java detection in parse_config_file() short-circuited before the existing
depth-comparison logic, so a parent pom.xml would override a closer
package.json or pyproject.toml. Now all config sources are detected first
and the closest one to CWD wins.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…TODO-34, TODO-38)

TODO-34: TracingClassVisitor hardcoded line number to 0 because ASM's
visitMethod() doesn't provide line info. Added a pre-scan pass in
TracingTransformer.instrumentClass() that collects first line numbers
via visitLineNumber() before the instrumentation pass.

TODO-38: Serialization timeouts/failures silently dropped captures with
no visibility. Added AtomicInteger droppedCaptures counter and included
it in flush() metadata output.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Changed detect_packages_from_source() from min(2, len) to min(3, len)
so com.aerospike.client.util produces prefix com.aerospike.client
instead of com.aerospike. This reduces instrumentation to the actual
source package instead of the entire organization namespace.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@mashraf-222
Copy link
Copy Markdown
Contributor

mashraf-222 commented Mar 24, 2026

3 commits addressing the remaining open TODOs from the code review:

1. 5942ae9e — Monorepo config priority

parse_config_file() used to short-circuit on the first Java build file found, before the existing depth-comparison logic for package.json/pyproject.toml. In a monorepo with a parent pom.xml and a child package.json, Java would incorrectly win. Now all config sources are detected first and the closest one to CWD wins.

2. 12921447 — Tracer line numbers + dropped captures

  • Line numbers: TracingClassVisitor hardcoded 0 because ASM's visitMethod() doesn't provide line info. Added a pre-scan pass in instrumentClass() that collects first line numbers via visitLineNumber() before the instrumentation pass.
  • Dropped captures: Serialization timeouts silently dropped invocations with no tracking. Added AtomicInteger droppedCaptures counter and included it in flush metadata.

3. 970c9f86 — Package detection scope

Changed detect_packages_from_source() from 2 package components to 3, so com.aerospike.client.utilcom.aerospike.client (was com.aerospike). Reduces instrumentation blast radius from entire organization to actual source package.

E2E validation

Fibonacci optimization: 258x speedup. Full pipeline (discovery → test gen → instrumentation → benchmarking → candidate evaluation) passed.

mashraf-222 added a commit that referenced this pull request Mar 25, 2026
…orepo subdirectory scanning

Adapt find_all_config_files() after rebasing on java-config-redesign (PR #1880):
- Java detected via pom.xml/build.gradle instead of codeflash.toml
- Add subdirectory scan for monorepo language subprojects (java/, js/ etc.)
- Extract _check_dir_for_configs() to eliminate duplicated detection logic
- Fix --all flag in multi-language mode (module_root wasn't available during resolution)
- Add Java project_root directory override in apply_language_config()
- Update all tests to use build-tool detection mocks and directory-based Java paths
- Add 5 new monorepo discovery tests (subdir Java, subdir JS, all-three, skip-hidden, root-wins)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Zero-config Java support (auto-detection from build files, codeflash.toml
elimination) is handled separately in cf-java-zero-config-strategy. This
commit strips those changes, keeping only bug fixes:
- JFR parser, ReplayHelper, instrumentation, replay tests
- Multi-module test root resolution
- JUnit 4/5 test framework detection
- add_help=False for optimize subparser

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ mashraf-222
❌ Ubuntu


Ubuntu seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

HeshamHM28 and others added 3 commits March 27, 2026 10:38
…candidate keys

  2. JFR tool not found — missing JAVA_HOME fallback
  3. JaCoCo coverage broken — -DargLine was overwriting JaCoCo's agent flag
  4. runtime=0 dropped — if result.runtime: was falsy for zero-nanosecond result
@HeshamHM28 HeshamHM28 force-pushed the java-config-redesign branch from 01cb9f8 to f3eecac Compare March 31, 2026 09:23
HeshamHM28 and others added 6 commits April 1, 2026 06:46
- Walk up parent directories when looking for mvnw wrapper, fixing
  multi-module projects where mvnw is in the root but optimizer runs
  from a submodule
- Respect user's --no-pr flag in Java tracer path instead of hardcoding
  no_pr=True, allowing PR creation from tracer-based optimizations
- Add --no-pr to e2e tracer test script

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add mvn/gradle test suite examples, fix replay test description,
document current limitations (void methods, mvnw search, --add-opens).
Remove unverified claims.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
--no-pr is a top-level codeflash flag, not an optimize subcommand flag.
Placing it after optimize caused it to be passed to the JVM as an
unrecognized option.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…nction matching

Replay test metadata now stores qualified names (e.g. Matrix4f.invertLocal)
instead of short names (invertLocal). This prevents mismatches when multiple
classes have methods with the same name, ensuring replay tests are correctly
mapped to their source functions during optimization.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@HeshamHM28 HeshamHM28 marked this pull request as ready for review April 1, 2026 07:34
misrasaurabh1 and others added 2 commits April 1, 2026 12:45
fix(java): use qualified names in replay test metadata for correct function matching
@misrasaurabh1 misrasaurabh1 merged commit 15a9261 into main Apr 1, 2026
65 of 75 checks passed
@misrasaurabh1 misrasaurabh1 deleted the java-config-redesign branch April 1, 2026 22:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

workflow-modified This PR modifies GitHub Actions workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants