feat: zero-config Java projects + smart ReplayHelper for end-to-end optimization by misrasaurabh1 · Pull Request #1880 · codeflash-ai/codeflash

misrasaurabh1 · 2026-03-20T03:20:02Z

Summary

Eliminates codeflash.toml for Java projects and fixes the complete trace → optimize pipeline to work end-to-end on real Java projects (validated on aerospike-client-java).

Zero-config Java support

Auto-detect Java projects from pom.xml / build.gradle — no config file needed
Read custom settings from pom.xml <properties> or gradle.properties (codeflash.* keys)
Multi-module Maven scanning: parses each module's <sourceDirectory> / <testSourceDirectory>, picks module with most Java files as source root
Deleted all codeflash.toml files

Smart ReplayHelper (behavior + performance parity)

ReplayHelper.replay() now reads CODEFLASH_MODE env var and produces the same output as existing test instrumentation
Behavior mode: captures return value via Kryo, writes to SQLite test_results table for correctness comparison
Performance mode: runs inner loop for JIT warmup, prints timing markers matching the optimizer's expected format
No mode: just invokes (trace-only or manual testing)

Bug fixes

JFR parser: normalize / → . in class names (JVM internal format vs Java package format)
Graceful timeout: send SIGTERM before SIGKILL so JFR can dump recording and shutdown hooks run
TracingTransformer: remove isRecording() check that prevented instrumenting classes loaded during serialization (was causing 3 captures instead of 10,000+)
Replay test generator: JUnit 4 support (org.junit.Test vs org.junit.jupiter.api.Test), detect from project build config
Overloaded methods: global counter per method name to avoid duplicate replay test method names
Instrumentation: fix _add_behavior_instrumentation for compact @Test lines (annotation + signature on same line)
project_root: use build root directory (not sub-module) for multi-module Maven projects
optimize subparser: add_help=False so -h in Java commands isn't intercepted as --help

Validated end-to-end on aerospike-client-java

10,500+ invocations traced across 282 methods
41 functions ranked by JFR CPU profiling data
55 replay test files generated (JUnit 4 compatible)
Replay tests compile, run, and pass (129 tests for Crypto.computeDigest)
Behavior baseline established with timing data (4.81ms over 119 loops)
Candidates correctly verified and rejected when behavior doesn't match

Test plan

33 config detection tests (build tool, source/test root, Maven/Gradle properties, multi-module)
13 JFR parser tests (normalization, filtering, ranking, timeout, project_root)
10 replay test generation tests (JUnit 4/5, overloads, instrumentation)
8 tracer e2e tests (agent capture, replay generation, orchestration)
6 integration tests (full pipeline: discover → rank → compile)
2 replay test discovery tests
Full optimizer pipeline on aerospike benchmark: trace → discover → rank → optimize → verify

🤖 Generated with Claude Code

…iles Java projects no longer need a standalone config file. Codeflash reads config from pom.xml <properties> or gradle.properties, and auto-detects source/test roots from build tool conventions. Changes: - Add parse_java_project_config() to read codeflash.* properties from pom.xml and gradle.properties - Add multi-module Maven scanning: parses each module's pom.xml for <sourceDirectory> and <testSourceDirectory>, picks module with most Java files as source root, identifies test modules by name - Route Java projects through build-file detection in config_parser.py before falling back to pyproject.toml - Detect Java language from pom.xml/build.gradle presence (no config needed) - Fix project_root for multi-module projects (was resolving to sub-module) - Fix JFR parser / separators (JVM uses com/example, normalized to com.example) - Fix graceful timeout (SIGTERM before SIGKILL for JFR dump + shutdown hooks) - Remove isRecording() check from TracingTransformer (was preventing class instrumentation for classes loaded during serialization) - Delete all codeflash.toml files from fixtures and code_to_optimize - Add 33 config detection tests - Update docs for zero-config Java setup Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replay tests call helper.replay() via reflection, not the target function directly. The behavior instrumentation can't wrap indirect calls and produces malformed output (code emitted outside class body) for large replay test files. For replay tests, just rename the class without adding instrumentation — JUnit pass/fail results verify correctness. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Detect test framework from project build config and generate replay tests with appropriate imports (org.junit.Test for JUnit 4, org.junit.jupiter.api.Test for JUnit 5). Fixes compilation failures on projects using JUnit 4 (like aerospike-client-java). Also passes test_framework through run_java_tracer to generate_replay_tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ay tests Use a global counter per method name across all descriptors to generate unique test method names. Previously, overloaded methods (same name, different descriptor) would generate duplicate replay_methodName_N methods, causing compilation errors. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@afterclass

…on skip 10 new tests covering: - JUnit 5 replay test generation (imports, class visibility) - JUnit 4 replay test generation (imports, public methods, @afterclass) - Overloaded method handling (no duplicate test method names) - Instrumentation skip for replay tests (behavior + perf mode) - Regular tests still get instrumented normally Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…solution 13 new tests covering: - JFR class name normalization (/ to . conversion) - Package-based sample filtering - Addressable time calculation from JFR samples - Method ranking order and format - Graceful timeout (SIGTERM before SIGKILL) - Multi-module project root detection (Path not str) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@test

The behavior instrumentation was producing malformed output for compact @test lines (annotation + method signature on same line, common in replay tests). The method signature collection loop would skip past the opening brace and consume subsequent methods' content. Fix: detect when the @test annotation line already contains { and treat it as both annotation and method signature, avoiding the separate signature search that was over-consuming lines. Reverted the instrumentation skip for replay tests — they now get properly instrumented for both behavior capture and performance timing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

claude · 2026-03-20T03:20:44Z

Claude finished @HeshamHM28's task in 4s —— View job

PR Review Summary

Prek Checks

✅ prek run --from-ref origin/main — clean, no issues.

Mypy reports 19 errors across models.py and discovery/functions_to_optimize.py, but these are all pre-existing (the PR only changes one line in models.py: runtime is not None). Not introduced by this PR.

Code Review

Binary file committed

codeflash/languages/java/resources/codeflash-runtime-1.0.0.jar is a binary file that changed in this PR (15.97 MB → 15.98 MB). This is expected since ReplayHelper.java and the tracer were updated, but binary files in git make diffs unreadable and repo size grows permanently. Consider whether this JAR should be built from source during CI (via mvn package) and excluded from source control via .gitignore, or if committing it is an intentional design choice for simplicity.

`getCallingTestClassName()` heuristic is fragile

ReplayHelper.java:getCallingTestClassName() walks the stack looking for a class name containing "ReplayTest" or "replay". Generated test classes are named ReplayTest_<classname> so this works for the happy path. However:

The fallback (stack[3].getClassName()) could return a JUnit framework internal class (org.junit.runners.*) rather than the test class when JUnit uses reflection-based invocation
For manually written tests that call helper.replay() from a class not matching these strings, the behavior mode will associate results with the wrong class name, causing comparison failures

A more robust approach would be to pass the test class name explicitly to replay() — but that would change the generated test format. At minimum, document this constraint.

Windows: SIGTERM not graceful

_run_java_with_graceful_timeout() in codeflash/languages/java/tracer.py:41 calls proc.send_signal(signal.SIGTERM). On Windows, this calls TerminateProcess() — the same as proc.kill(), so JFR can't dump its recording. The subsequent proc.kill() in the except handler is then a no-op on an already-dead process. Low severity (JFR/Java profiling is primarily Linux/macOS), but the code doesn't log a warning or handle the Windows case.

`detect_java_project` called twice at trace time

In codeflash/tracer.py:388-392, detect_java_project(project_root) is called to get test_framework. This is a redundant detection since the same config was already read during process_pyproject_config. If JavaProjectConfig.test_framework were propagated into the argparse Namespace during setup (similar to how tests_root and module_root are), this extra filesystem scan could be avoided. Minor, but worth a follow-up. Fix this →

`replayBehavior` invocation ID format

In ReplayHelper.java:replayBehavior(), invId = testMethodName + "_" + testIteration. The comment says testIteration goes at the END so the comparator's lastUnderscore stripping can make baseline/candidate keys match. However, testMethodName itself could already contain underscores (e.g., replay_computeDigest_42). If the comparator strips only the last underscore segment, this should be fine — but this is subtle coupling between the Java runtime and the Python comparator that is not documented anywhere. Adding a comment linking to the Python-side comparator logic would help future maintainers.

Duplicate Detection

No duplicates detected. The new functions (_run_java_with_graceful_timeout, generate_replay_tests JUnit4 support, getCallingTestClassName/MethodName) are unique to their modules.

Test Coverage

All 23 new tests pass (test_jfr_parser.py: 13 tests, test_replay_test_generation.py: 10 tests).

File	Coverage
`jfr_parser.py`	71%
`replay_test.py`	76%
`instrumentation.py`	61%
`tracer.py` (java)	33%
`maven_strategy.py`	9% (largely pre-existing)

tracer.py at 33% is low but most uncovered code is subprocess orchestration (_run_java_with_graceful_timeout, JavaTracer.trace()) that is inherently difficult to unit test. The graceful timeout logic has dedicated tests in TestGracefulTimeout. Acceptable for now.

Last updated: 2026-04-01T20:13Z

ReplayHelper now reads CODEFLASH_MODE env var and produces the same output as the existing test instrumentation: - Behavior mode: captures return value via Kryo serialization, writes to SQLite (test_results table) for correctness comparison, prints start/end timing markers - Performance mode: runs inner loop for JIT warmup, prints timing markers for each iteration matching the expected format - No mode: just invokes the method (trace-only or manual testing) This achieves feature parity with the existing test instrumentation for replay tests, which call functions via reflection and can't be wrapped by text-level instrumentation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ay tests + speedups - Trigger on any codeflash/** or tests/** changes (not just java subset) - Validate replay test files are discovered per-function - Already validates: replay test generation, global discovery count, optimization success, and minimum speedup percentage Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The refactored Java project_root handling moved args.tests_root resolution after the project_root_from_module_root call, which passed a string instead of a Path. Restore the original order: resolve tests_root to Path first, then set test_project_root, then override both for Java multi-module projects. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Use Path comparisons instead of forward-slash substring matching - Avoid parse_args() in test (reads stdin on Windows) — use Namespace directly Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Use print(flush=True) instead of logging.info for subprocess output so CI logs show progress in real-time instead of buffering until completion. Also set PYTHONUNBUFFERED=1 for the subprocess. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…_write_gradle_properties Co-authored-by: Saurabh Misra <undefined@users.noreply.github.com>

…ions harder - Set jdk.ExecutionSample#period=1ms (default was 10ms) so JFR captures samples from shorter-running programs - Workload.main now runs 1000 rounds with larger inputs so JFR can capture method-level CPU samples (repeatString with O(n²) concat dominates ~75% of samples) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ove priority Replace xml.etree.ElementTree with text-based regex manipulation in _write_maven_properties() and _remove_java_build_config(). ElementTree destroys XML comments, mangles namespace declarations (ns0: prefixes), and reformats whitespace. The new approach reads/writes pom.xml as plain text, only touching codeflash.* property lines. Also extracts duplicated key_map to shared _MAVEN_KEY_MAP constant and aligns remove priority to check pom.xml first (matching write order). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

mashraf-222 · 2026-03-24T16:30:42Z

Review & fix: pom.xml formatting preservation (`3c63b60`)

Reviewed the PR changes and identified one genuine issue in the config writer.

What was fixed

_write_maven_properties() and _remove_java_build_config() destroy pom.xml formatting — both used xml.etree.ElementTree which strips XML comments, mangles namespace declarations (xmlns → ns0: prefixes), and reformats all whitespace. Replaced with text-based regex manipulation that only touches codeflash.* property lines and preserves everything else. The new approach also detects existing indentation from sibling elements so inserted properties match the file's style.

Additionally:

Extracted duplicated key_map dict to shared _MAVEN_KEY_MAP module constant
Aligned _remove_java_build_config() to check pom.xml first, matching write priority
Added 9 tests (7 for config_writer, 2 for detector confirming zero-config intent)

What was reviewed and left as-is

Confirmed the following are intentional design decisions, not bugs:

has_existing_config() returning True for any Java build file — this is the zero-config mechanism that skips the first-run setup flow
isRecording() guard removal in TracingTransformer — PR description explains this fixed captures going from 3 to 10,000+
add_help=False on optimize subparser — prevents argparse from intercepting -h meant for Java command passthrough

Validation

All 152 setup tests pass (including 9 new)
All 8 tracer E2E tests pass
Full Fibonacci E2E optimization: 279x speedup found, mark-as-success 200
prek clean (ruff check + format)

…os (TODO-37) Java detection in parse_config_file() short-circuited before the existing depth-comparison logic, so a parent pom.xml would override a closer package.json or pyproject.toml. Now all config sources are detected first and the closest one to CWD wins. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…TODO-34, TODO-38) TODO-34: TracingClassVisitor hardcoded line number to 0 because ASM's visitMethod() doesn't provide line info. Added a pre-scan pass in TracingTransformer.instrumentClass() that collects first line numbers via visitLineNumber() before the instrumentation pass. TODO-38: Serialization timeouts/failures silently dropped captures with no visibility. Added AtomicInteger droppedCaptures counter and included it in flush() metadata output. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Changed detect_packages_from_source() from min(2, len) to min(3, len) so com.aerospike.client.util produces prefix com.aerospike.client instead of com.aerospike. This reduces instrumentation to the actual source package instead of the entire organization namespace. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

mashraf-222 · 2026-03-24T16:51:00Z

3 commits addressing the remaining open TODOs from the code review:

1. `5942ae9e` — Monorepo config priority

parse_config_file() used to short-circuit on the first Java build file found, before the existing depth-comparison logic for package.json/pyproject.toml. In a monorepo with a parent pom.xml and a child package.json, Java would incorrectly win. Now all config sources are detected first and the closest one to CWD wins.

2. `12921447` — Tracer line numbers + dropped captures

Line numbers: TracingClassVisitor hardcoded 0 because ASM's visitMethod() doesn't provide line info. Added a pre-scan pass in instrumentClass() that collects first line numbers via visitLineNumber() before the instrumentation pass.
Dropped captures: Serialization timeouts silently dropped invocations with no tracking. Added AtomicInteger droppedCaptures counter and included it in flush metadata.

3. `970c9f86` — Package detection scope

Changed detect_packages_from_source() from 2 package components to 3, so com.aerospike.client.util → com.aerospike.client (was com.aerospike). Reduces instrumentation blast radius from entire organization to actual source package.

E2E validation

Fibonacci optimization: 258x speedup. Full pipeline (discovery → test gen → instrumentation → benchmarking → candidate evaluation) passed.

…orepo subdirectory scanning Adapt find_all_config_files() after rebasing on java-config-redesign (PR #1880): - Java detected via pom.xml/build.gradle instead of codeflash.toml - Add subdirectory scan for monorepo language subprojects (java/, js/ etc.) - Extract _check_dir_for_configs() to eliminate duplicated detection logic - Fix --all flag in multi-language mode (module_root wasn't available during resolution) - Add Java project_root directory override in apply_language_config() - Update all tests to use build-tool detection mocks and directory-based Java paths - Add 5 new monorepo discovery tests (subdir Java, subdir JS, all-three, skip-hidden, root-wins) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Zero-config Java support (auto-detection from build files, codeflash.toml elimination) is handled separately in cf-java-zero-config-strategy. This commit strips those changes, keeping only bug fixes: - JFR parser, ReplayHelper, instrumentation, replay tests - Multi-module test root resolution - JUnit 4/5 test framework detection - add_help=False for optimize subparser Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

CLAassistant · 2026-03-26T07:40:55Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ mashraf-222
❌ Ubuntu

Ubuntu seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

…candidate keys 2. JFR tool not found — missing JAVA_HOME fallback 3. JaCoCo coverage broken — -DargLine was overwriting JaCoCo's agent flag 4. runtime=0 dropped — if result.runtime: was falsy for zero-nanosecond result

- Walk up parent directories when looking for mvnw wrapper, fixing multi-module projects where mvnw is in the root but optimizer runs from a submodule - Respect user's --no-pr flag in Java tracer path instead of hardcoding no_pr=True, allowing PR creation from tracer-based optimizations - Add --no-pr to e2e tracer test script Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add mvn/gradle test suite examples, fix replay test description, document current limitations (void methods, mvnw search, --add-opens). Remove unverified claims. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

--no-pr is a top-level codeflash flag, not an optimize subcommand flag. Placing it after optimize caused it to be passed to the JVM as an unrecognized option. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…vior" This reverts commit 85e8a51.

…nction matching Replay test metadata now stores qualified names (e.g. Matrix4f.invertLocal) instead of short names (invertLocal). This prevents mismatches when multiple classes have methods with the same name, ensuring replay tests are correctly mapped to their source functions during optimization. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix(java): use qualified names in replay test metadata for correct function matching

misrasaurabh1 and others added 7 commits March 19, 2026 19:11

misrasaurabh1 changed the title ~~Java config redesign + bugfixs for Tracer~~ feat: zero-config Java projects + smart ReplayHelper for end-to-end optimization Mar 20, 2026

github-actions bot added the workflow-modified This PR modifies GitHub Actions workflows label Mar 20, 2026

misrasaurabh1 and others added 6 commits March 19, 2026 22:40

fix: Windows compatibility for Java config detection tests

74cbe2a

- Use Path comparisons instead of forward-slash substring matching - Avoid parse_args() in test (reads stdin on Windows) — use Namespace directly Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: add missing type params for dict in _write_maven_properties and …

803fb64

…_write_gradle_properties Co-authored-by: Saurabh Misra <undefined@users.noreply.github.com>

mashraf-222 and others added 3 commits March 24, 2026 16:50

HeshamHM28 and others added 3 commits March 27, 2026 10:38

Merge branch 'main' into java-config-redesign

5add169

style: remove extra blank line in cli.py

f3eecac

HeshamHM28 force-pushed the java-config-redesign branch from 01cb9f8 to f3eecac Compare March 31, 2026 09:23

HeshamHM28 and others added 6 commits April 1, 2026 06:46

Merge branch 'main' into java-config-redesign

c9c5f9d

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Revert "docs: update Java tracer documentation to match verified beha…

27535c1

…vior" This reverts commit 85e8a51.

HeshamHM28 marked this pull request as ready for review April 1, 2026 07:34

misrasaurabh1 and others added 2 commits April 1, 2026 12:45

Merge pull request #1939 from codeflash-ai/cf-fix-replay-test-discovery

6d65dd5

fix(java): use qualified names in replay test metadata for correct function matching

Merge branch 'main' into java-config-redesign

823300f

mashraf-222 approved these changes Apr 1, 2026

View reviewed changes

misrasaurabh1 merged commit 15a9261 into main Apr 1, 2026
65 of 75 checks passed

misrasaurabh1 deleted the java-config-redesign branch April 1, 2026 22:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: zero-config Java projects + smart ReplayHelper for end-to-end optimization#1880

feat: zero-config Java projects + smart ReplayHelper for end-to-end optimization#1880
misrasaurabh1 merged 30 commits intomainfrom
java-config-redesign

misrasaurabh1 commented Mar 20, 2026 •

edited

Loading

Uh oh!

claude bot commented Mar 20, 2026 •

edited

Loading

Uh oh!

mashraf-222 commented Mar 24, 2026

Uh oh!

mashraf-222 commented Mar 24, 2026 •

edited

Loading

Uh oh!

CLAassistant commented Mar 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

misrasaurabh1 commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Zero-config Java support

Smart ReplayHelper (behavior + performance parity)

Bug fixes

Validated end-to-end on aerospike-client-java

Test plan

Uh oh!

claude bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Summary

Prek Checks

Code Review

Binary file committed

getCallingTestClassName() heuristic is fragile

Windows: SIGTERM not graceful

detect_java_project called twice at trace time

replayBehavior invocation ID format

Duplicate Detection

Test Coverage

Uh oh!

mashraf-222 commented Mar 24, 2026

Review & fix: pom.xml formatting preservation (3c63b60)

What was fixed

What was reviewed and left as-is

Validation

Uh oh!

mashraf-222 commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. 5942ae9e — Monorepo config priority

2. 12921447 — Tracer line numbers + dropped captures

3. 970c9f86 — Package detection scope

E2E validation

Uh oh!

CLAassistant commented Mar 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

misrasaurabh1 commented Mar 20, 2026 •

edited

Loading

claude bot commented Mar 20, 2026 •

edited

Loading

`getCallingTestClassName()` heuristic is fragile

`detect_java_project` called twice at trace time

`replayBehavior` invocation ID format

Review & fix: pom.xml formatting preservation (`3c63b60`)

mashraf-222 commented Mar 24, 2026 •

edited

Loading

1. `5942ae9e` — Monorepo config priority

2. `12921447` — Tracer line numbers + dropped captures

3. `970c9f86` — Package detection scope