feat: multi-language orchestration loop with per-language config discovery by mashraf-222 · Pull Request #1918 · codeflash-ai/codeflash

mashraf-222 · 2026-03-27T15:50:09Z

Summary

Adds multi-language support to the CLI. A single codeflash invocation now discovers and processes all configured languages (Python, Java, JS/TS) sequentially.

CF-1075

How it works

Config discovery (find_all_config_files in config_parser.py):

Walks upward from CWD — finds pyproject.toml (Python), package.json (JS/TS), pom.xml/build.gradle (Java)
Scans immediate subdirectories for monorepo support
Closest config wins per language, one project per language

Orchestration loop (main.py):

For each discovered LanguageConfig, deep-copies args, applies language-specific config, runs optimizer
Error in one language does not block others
Git/GitHub checks run once before the loop

Key behaviors:

--file filters to the matching language automatically
--all re-resolves per language (each gets its own module_root)
Backward compatible: single-language projects use existing code path unchanged

Files changed

File	What
`config_parser.py`	`LanguageConfig` dataclass, `find_all_config_files()`, `normalize_toml_config()` extraction
`cli.py`	`apply_language_config()` — maps LanguageConfig onto CLI args
`main.py`	Orchestration loop with per-language error isolation and summary logging
`test_git_utils.py`	Language-agnostic test updates
`test_registry.py`	Java registration in clear-functions test
`test_multi_config_discovery.py`	16 tests for config discovery
`test_multi_language_orchestration.py`	36 tests for orchestration, CLI routing, error handling

Test plan

52 unit tests pass (pytest tests/test_multi_language_orchestration.py tests/test_multi_config_discovery.py -v)
prek run --from-ref origin/main passes
E2E: single-language Python project still works unchanged
E2E: multi-language project discovers all configs and processes each

claude · 2026-03-27T15:50:51Z

Claude finished @mashraf-222's task in 7m 57s —— View job

PR Review Summary

Triage PR size — LARGE (350+ production lines changed)
Run prek lint/typecheck
Resolve stale threads — no unresolved threads
Code review
Duplicate detection
Test coverage analysis
Post final summary

Prek Checks

ruff check and ruff format: PASS

Mypy shows 14 errors across config_parser.py and git_utils.py. Spot-checked: the config_parser.py errors (list_str_keys annotation, bool/str incompatibility in the normalization loops) are pre-existing patterns carried over from the inline code in the original parse_config_file. The git_utils.py errors are pre-existing (only line 152 changed in this PR). No new mypy regressions introduced.

Code Review

Fixed since last review cycle ✅

All three critical bugs from the previous review cycle are addressed:

add_help=False removed from optimize subparser (cli.py)
package.json false positives: _check_dir_for_configs now requires a "codeflash" key (config_parser.py:212)
Filesystem walk stops at git root (config_parser.py:263)
filter_configs_for_file correctly excludes spurious configs when --file is specified (main.py:41-53)

Previous note corrected: "benchmark": False in normalize_toml_config's bool_keys is not new — the same key was already in the inline normalization code of parse_config_file on main. No issue here.

Still open — medium severity

init_sentry and posthog_cf.initialize_posthog are still called once per language inside the orchestration loop (main.py:181-182), not once before it. In every other code path (single-lang, verify-setup, command dispatch), they're called once. If telemetry init is idempotent the runtime impact is zero, but it's architecturally inconsistent and could cause issues if disable_telemetry differs between language configs (last-language setting wins for whichever call actually sticks). Fix this →

Minor

filter_configs_for_file fallback (main.py:53): when --file is given but no config's project root contains it, the function returns all configs unchanged. This is acknowledged design, but means the "spurious config" scenario still triggers if neither config covers the file. Not a bug given the comment, but worth knowing.
check_running_in_git_repo (git_utils.py:149) signature is (module_root: str) but tests document it accepting None. The mypy error is pre-existing; just noting the annotation is slightly loose.

Duplicate Detection

No duplicates detected. normalize_toml_config is correctly shared between both the new multi-lang discovery path and the existing parse_config_file. The resolve_config_onto_args extraction properly eliminates the duplication that would have existed between process_pyproject_config and apply_language_config.

Test Coverage

59 new tests across three test files — all pass. Coverage on the new orchestration path:

File	Coverage
`main.py`	62%
`config_parser.py`	47%
`cli_cmds/cli.py`	31%

The lower numbers reflect untouched legacy paths (init, auth, verify-setup, etc.) that are exercised in other test suites. The new paths added by this PR are well-covered by test_multi_language_orchestration.py and test_multi_config_discovery.py.

Missing test noted from previous review (still missing): a test verifying init_sentry/posthog are called exactly once in multi-language mode. Once the call is moved before the loop, this would be straightforward to add.

Last updated: 2026-04-01

…discovery Adds the ability for codeflash CLI to discover and optimize multiple languages in a single run. When configs for Python, Java, and/or JS/TS are found, the optimizer runs once per language with isolated args. - Add LanguageConfig dataclass and find_all_config_files() for upward walk + monorepo subdirectory scanning (Python/JS/Java) - Extract normalize_toml_config() from inline parse_config_file logic - Add apply_language_config() for per-language arg setup in cli.py - Set language singleton early in process_pyproject_config() - Add orchestration loop in main.py with --file language filtering, per-language error isolation, and summary logging - Use build_config_strategy.parse_java_project_config() (from #1906) - Make get_git_diff tests language-agnostic (no singleton dependency) Intentionally omits auto_configure_language and detect_unconfigured_languages per Linear CF-1075 review. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

TypeScript projects discovered via package.json were incorrectly assigned Language.JAVASCRIPT. Now derives language from the config dict returned by parse_package_json_config() and deduplicates both JS and TS together. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Extract resolve_config_onto_args() from duplicated logic between process_pyproject_config and apply_language_config (~100 lines deduped). This fixes: - Missing benchmark validation in apply_language_config (HIGH) - Missing LSP guard in apply_language_config (MEDIUM) - --all /specific/path silently overridden in orchestration loop (MEDIUM) - Silent except Exception: pass in config discovery (MEDIUM) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

In the multi-language orchestration path, handle_optimize_all_arg_parsing runs before config is loaded, so args.git_remote is unset. This caused repo.remote(name=None) to raise ValueError, crashing every --file/--all run without --no-pr. Default to "origin" when git_remote is not set. Also catch git.NoSuchPathError in check_running_in_git_repo and handle_optimize_all_arg_parsing so non-existent module_root paths return graceful errors instead of uncaught exceptions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

aseembits93 · 2026-03-31T16:55:03Z

@claude review

claude · 2026-03-31T16:55:46Z

Claude finished @aseembits93's task in 2m 58s —— View job

PR Review

Summary

Solid feature work overall — the orchestration loop, config discovery, and test coverage are well-structured. Found several bugs and a few design issues that should be addressed before merging.

Bugs

1. benchmark added to normalize_toml_config bool_keys — wasn't there before

config_parser.py:123-127:

bool_keys = {
    "override-fixtures": False,
    "disable-telemetry": False,
    "disable-imports-sorting": False,
    "benchmark": False,  # ← NEW, wasn't in original
}

benchmark is a CLI flag, not a pyproject.toml config key. Adding it here means every normalized config dict now gets benchmark: False injected, which is misleading. It doesn't currently cause a bug (it's not in supported_keys so it won't be applied to args), but it's dead weight in the config dict and could cause confusion. Fix this →

2. add_help=False on optimize subparser — likely unintentional

cli.py:413:

trace_optimize = subparsers.add_parser("optimize", help="...", add_help=False)

This removes -h/--help from the optimize subcommand and is not mentioned in the PR description. This is a regression — codeflash optimize --help will no longer work. Fix this →

3. init_sentry and posthog called per language in the orchestration loop

main.py:162-163:

for lang_config in language_configs:
    ...
    init_sentry(enabled=not pass_args.disable_telemetry, exclude_errors=True)   # ← N times
    posthog_cf.initialize_posthog(enabled=not pass_args.disable_telemetry)       # ← N times

These are initialization calls that are called once in all other code paths. In the multi-language loop, they're called once per language. Even if idempotent, this is inconsistent and could cause issues if disable_telemetry differs between language configs (the last language's setting wins). Move them before the loop. Fix this →

Design Issues

4. set_current_language singleton overwritten in each loop iteration

cli.py:144-145:

if language is not None:
    set_current_language(language)

resolve_config_onto_args is called per language, each time overwriting the global singleton. After the loop completes, the singleton holds the last processed language. If any code reads current_language() outside the optimizer (e.g., in cleanup, logging, or telemetry), it sees a stale/wrong value. This is a pre-existing concern but the multi-language loop amplifies it.

5. find_all_config_files walks to filesystem root with no depth limit

config_parser.py:242-248:

dir_path = start_dir.resolve()
while True:
    _check_dir_for_configs(dir_path, configs, seen_languages)
    parent = dir_path.parent
    if parent == dir_path:
        break
    dir_path = parent

On developer machines this will routinely scan /home/user, /home, /, etc. A pyproject.toml at /home/user/pyproject.toml (not uncommon for global tools) would be picked up. Consider stopping at the git repo root (git rev-parse --show-toplevel) or at a configurable max depth.

6. No first-run experience in the multi-config path

main.py:99-119:

language_configs = find_all_config_files()
if not language_configs:
    loaded_args = _handle_config_loading(args)  # ← first-run only in this branch
    ...

If find_all_config_files() returns any config (even one with a bare pom.xml or package.json), the first-run experience via _handle_config_loading is completely bypassed. A user with only a Java or JS project who hasn't done codeflash init will silently skip setup.

7. _parse_java_config_for_dir is a redundant wrapper

config_parser.py:163-166:

def _parse_java_config_for_dir(dir_path: Path) -> dict[str, Any] | None:
    from codeflash.languages.java.build_config_strategy import parse_java_project_config
    return parse_java_project_config(dir_path)

This wrapper only exists to defer the import, but the same pattern is already used inline elsewhere. It's also the only reason _parse_java_config_for_dir needs to be mockable in tests (the tests patch codeflash.code_utils.config_parser._parse_java_config_for_dir). The wrapper is fine for testability, but it's worth documenting that purpose.

8. original_all save/restore logic is fragile

main.py:135-154: The manual capture of original_all before handle_optimize_all_arg_parsing mutates args.all (to None when module_root is absent) and then per-language restoration is correct but subtle. A reader unfamiliar with the mutation would miss why original_all is needed. The comment helps, but consider whether handle_optimize_all_arg_parsing can be split into a "git checks" part and an "args mutation" part to make the intent clearer.

Test Coverage

52 unit tests are good coverage for the new code paths
The test for test_root_config_wins_over_subdir correctly tests the "closest config wins" invariant
Missing: test for the --all "" (no path) vs --all /path distinction in the orchestration loop's original_all handling
Missing: test that init_sentry/posthog are called exactly once in multi-language mode
The test_git_utils.py cleanup (removing set_current_language singleton setup from tests) is a good improvement

Minor

config_parser.py:194: pyproject.open("rb") followed by tomlkit.parse(f.read()) — this pattern is borrowed from parse_config_file and works because tomlkit handles bytes, but it's undocumented. Not a bug.
git_utils.py:152: (git.InvalidGitRepositoryError, git.NoSuchPathError) — correct fix.
cli.py:263: getattr(args, "git_remote", None) or "origin" — clean fix for the git_remote=None case.

HeshamHM28 · 2026-04-01T08:21:37Z

Code Review: Bug Report

Ran a thorough static analysis and runtime verification of this PR. Found 7 issues (2 critical, 2 medium, 3 low).

Bug 1 (Critical): Any `package.json` triggers JS config discovery — even without `codeflash` section

Files: config_js.py:262, config_parser.py:209-214

parse_package_json_config() auto-detects and returns a full config for any package.json, even without a "codeflash" key. In _check_dir_for_configs, this means a Python project with a package.json for dev tools (prettier, eslint, etc.) will falsely trigger JS optimization.

Contrast with Python, which correctly requires [tool.codeflash] in pyproject.toml.

Repro:

# package.json: {"name": "my-project", "devDependencies": {"prettier": "^3.0"}}
# No "codeflash" section — but parse_package_json_config still returns a full config
result = parse_package_json_config(path)  # result is NOT None

Fix: _check_dir_for_configs should check for "codeflash" key in package.json before treating it as a JS config, or parse_package_json_config should return None when no codeflash section exists.

Bug 2 (Critical): `testsRoot` path resolution is wrong in `config_js.py:279`

# Line 279 (BUGGY):
config["tests_root"] = str(project_root / Path(codeflash_config["testsRoot"]).resolve())
#                                          ^^^ resolves relative to CWD, THEN joins with project_root

# Line 273 (CORRECT pattern for moduleRoot):
config["module_root"] = str((project_root / Path(codeflash_config["moduleRoot"])).resolve())
#                            ^^^ joins first, THEN resolves

Verified: With CWD=/home/user, project_root=/home/user/project, testsRoot="tests":

Buggy: /home/user/project/home/user/tests
Correct: /home/user/project/tests

Fix: str((project_root / Path(codeflash_config["testsRoot"])).resolve())

Bug 3 (Medium): Java `tests_root` auto-detection uses wrong parent level

File: cli.py:152-153

test_path = Path(args.module_root).parent / test_dir if "/" in test_dir else Path(test_dir)

For module_root=/project/src/main/java, test_dir="src/test/java":

.parent → /project/src/main
Result: /project/src/main/src/test/java — wrong
Expected: /project/src/test/java

Fix: Resolve relative to the project root or config_path, not module_root.parent.

Bug 4 (Medium): JS/TS `tests_root` auto-detection is CWD-relative, not project-relative

File: cli.py:162-164

for test_dir in ["test", "tests", "__tests__"]:
    if Path(test_dir).is_dir():  # checks relative to CWD, not project root
        args.tests_root = test_dir  # stores bare relative string

In a monorepo where CWD != project root, this finds the wrong directory. Also, the bare string "test" stored in args.tests_root will fail the Path(args.tests_root).is_dir() assertion at line 178 if CWD changes between detection and assertion.

Bug 5 (Low): `handle_optimize_all_arg_parsing` sets `args.all = None` wastefully in multi-lang path

Files: cli.py:272, main.py:139

In the multi-language path, handle_optimize_all_arg_parsing runs before any language config is applied. args.module_root is None at that point, so args.all = args.module_root sets it to None. This is mitigated by main.py:148-154 which re-resolves per language, but the mutation is wasted work and the comment at main.py:132-134 acknowledges this oddity.

Bug 6 (Low): Type annotation mismatch for `check_running_in_git_repo`

File: git_utils.py:149

Signature is (module_root: str) -> bool but called with None at cli.py:73. Works because git.Repo(None, ...) accepts None, but annotation should be str | None.

Bug 7 (Low/Design): JS/TS configs skip `normalize_toml_config` normalization

File: config_parser.py:209-214

Python configs go through normalize_toml_config() (defaults, key conversion, validation). JS/TS configs come from parse_package_json_config() with separate normalization logic. While both produce underscored keys, the divergent paths could cause subtle inconsistencies (e.g., language key present in JS but absent in Python configs).

Summary: Bugs 1-2 are the most impactful — Bug 1 causes false JS discovery in Python-only projects, and Bug 2 produces wrong test paths for any JS/TS project with explicit testsRoot. Bugs 3-4 affect auto-detection correctness for Java and JS/TS projects.

- Stop upward config walk at git repo root to prevent picking up configs from parent directories outside the project - Require "codeflash" key in package.json for JS/TS multi-language discovery, matching Python's [tool.codeflash] opt-in model - Remove accidental add_help=False from optimize subparser Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

mashraf-222 · 2026-04-01T09:22:13Z

Addressed in ad661a7:

Bug 1 (package.json false positives): Fixed — _check_dir_for_configs now requires a "codeflash" key in package.json, matching Python's [tool.codeflash] opt-in model.
add_help=False: Fixed — removed, was accidental.
Filesystem root walk: Fixed — upward walk now stops at git repo root.

Not addressed (pre-existing, not introduced by this PR):

Bug 2 (testsRoot resolution): Real bug in config_js.py:279 but file not touched by this PR. Separate fix needed.
Bugs 3-4 (CWD-relative tests_root): Inherited from existing single-language code path. Not a regression.
Bugs 5-7: Low/design — args.all=None is handled by optimize_all_requested, type annotation is cosmetic, separate normalization paths are by design.

@HeshamHM28

…vered project root When running `--file` from a repo containing subdirectories with build files (e.g. codeflash-java-runtime/ with pom.xml), config discovery picks up those subdirs as language configs. This causes a ValueError crash because the target file isn't within the spurious project root. Two fixes: - main.py: filter_configs_for_file() excludes configs whose project root doesn't contain the target file - cli.py: only override project_root for Java when module_root is within config_path (defense-in-depth) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

mashraf-222 · 2026-04-01T10:14:47Z

Fixed in b320ef9. Two changes:

main.py: Added filter_configs_for_file() — when --file is specified, excludes discovered configs whose project root doesn't contain the target file. Falls back to all configs if none match.
cli.py: Java project_root override now only applies when module_root is actually within config_path. Previously it was unconditional, which broke when the user's --module-root pointed elsewhere.

Added 3 tests covering the spurious config scenario. E2E validated all 3 reviewer repro commands (Python, Java fixture, JS fixture path).

KRRT7 mentioned this pull request Mar 27, 2026

perf: optimize context extraction with caching and import pre-filtering #1921

Merged

2 tasks

mashraf-222 and others added 3 commits March 31, 2026 08:14

mashraf-222 force-pushed the cf-1075-multi-language-orchestration branch from 03d890d to bc6444a Compare March 31, 2026 08:16

mashraf-222 and others added 2 commits March 31, 2026 08:42

fix: resolve mypy type errors in resolve_config_onto_args and main

0c08b28

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

mashraf-222 force-pushed the cf-1075-multi-language-orchestration branch from d21a088 to ad661a7 Compare April 1, 2026 09:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: multi-language orchestration loop with per-language config discovery#1918

feat: multi-language orchestration loop with per-language config discovery#1918
mashraf-222 wants to merge 7 commits intomainfrom
cf-1075-multi-language-orchestration

mashraf-222 commented Mar 27, 2026 •

edited

Loading

Uh oh!

claude bot commented Mar 27, 2026 •

edited

Loading

Uh oh!

aseembits93 commented Mar 31, 2026

Uh oh!

claude bot commented Mar 31, 2026 •

edited

Loading

Uh oh!

HeshamHM28 commented Apr 1, 2026

Uh oh!

mashraf-222 commented Apr 1, 2026 •

edited

Loading

Uh oh!

mashraf-222 commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mashraf-222 commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

How it works

Files changed

Test plan

Uh oh!

claude bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Summary

Prek Checks

Code Review

Duplicate Detection

Test Coverage

Uh oh!

aseembits93 commented Mar 31, 2026

Uh oh!

claude bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review

Summary

Bugs

Design Issues

Test Coverage

Minor

Uh oh!

HeshamHM28 commented Apr 1, 2026

Code Review: Bug Report

Bug 1 (Critical): Any package.json triggers JS config discovery — even without codeflash section

Bug 2 (Critical): testsRoot path resolution is wrong in config_js.py:279

Bug 3 (Medium): Java tests_root auto-detection uses wrong parent level

Bug 4 (Medium): JS/TS tests_root auto-detection is CWD-relative, not project-relative

Bug 5 (Low): handle_optimize_all_arg_parsing sets args.all = None wastefully in multi-lang path

Bug 6 (Low): Type annotation mismatch for check_running_in_git_repo

Bug 7 (Low/Design): JS/TS configs skip normalize_toml_config normalization

Uh oh!

mashraf-222 commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mashraf-222 commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mashraf-222 commented Mar 27, 2026 •

edited

Loading

claude bot commented Mar 27, 2026 •

edited

Loading

claude bot commented Mar 31, 2026 •

edited

Loading

Bug 1 (Critical): Any `package.json` triggers JS config discovery — even without `codeflash` section

Bug 2 (Critical): `testsRoot` path resolution is wrong in `config_js.py:279`

Bug 3 (Medium): Java `tests_root` auto-detection uses wrong parent level

Bug 4 (Medium): JS/TS `tests_root` auto-detection is CWD-relative, not project-relative

Bug 5 (Low): `handle_optimize_all_arg_parsing` sets `args.all = None` wastefully in multi-lang path

Bug 6 (Low): Type annotation mismatch for `check_running_in_git_repo`

Bug 7 (Low/Design): JS/TS configs skip `normalize_toml_config` normalization

mashraf-222 commented Apr 1, 2026 •

edited

Loading