Skip to content

Integrate AIDE ML code optimization#2258

Open
efsiatras wants to merge 14 commits intooumi-ai:mainfrom
efsiatras:efsiatras/add-aide-agentic-optimization
Open

Integrate AIDE ML code optimization#2258
efsiatras wants to merge 14 commits intooumi-ai:mainfrom
efsiatras:efsiatras/add-aide-agentic-optimization

Conversation

@efsiatras
Copy link
Copy Markdown
Contributor

Description

This PR adds a new oumi aide command that brings agentic code optimization to Oumi using AIDE, the open-source ML engineering agent by Weco AI.

Why AIDE

Unlike oumi tune which searches a predefined hyperparameter space, AIDE uses an LLM to write, execute, and iteratively improve Python code through tree search (draft → debug → improve). It can explore code-level changes that parameter search cannot — reward function design, evaluation logic, training strategies.

AIDE implements the algorithm from arXiv:2502.13138 and has been independently validated by OpenAI (MLE-bench, 4x more medals than the best linear agent across 75 Kaggle competitions), Meta (LLM Speedrunning, AI Research Agents), Sakana AI (AI Scientist-v2), and METR (RE-Bench).

This PR integrates the open-source research version (aideml) which runs fully local. Weco AI also offers a separate production platform with experiment tracking and cloud-hybrid architecture.

Architecture

The integration mirrors the existing Optuna pattern:

  • core/configs/params/aide_params.py → AideParams (mirrors TuningParams)
  • core/configs/aide_config.py → AideConfig (mirrors TuningConfig)
  • core/agentic/base_agentic_optimizer.py → BaseAgenticOptimizer ABC (mirrors BaseTuner)
  • core/agentic/aide_optimizer.py → AideOptimizer (mirrors OptunaTuner)
  • core/agentic/workspace_helper.py → Helper script injected into workspace
  • builders/agentic.py → build_agentic_optimizer() (mirrors build_tuner)
  • aide.py → Orchestration (mirrors tune.py)
  • cli/aide_cmd.py → CLI command (mirrors cli/tune.py)

Optimization surfaces

  • CONFIG_SEARCH — modifies training hyperparameters (learning rate, optimizer, LoRA rank, etc.)
  • REWARD_FUNCTION — designs reward functions for GRPO/RLHF
  • EVAL_FUNCTION — generates custom evaluation functions
  • FULL_PIPELINE — writes complete training scripts with no constraints

Workspace helper

Instead of exposing raw Oumi APIs to the AIDE agent, we inject an oumi_helper.py into its workspace at runtime. The agent calls run_trial(), test_reward(), or test_eval() — these handle config loading from YAML, environment setup, and metric extraction. This works with any model, trainer, or config.

Dependency handling

aideml pins exact versions of pandas/numpy/scipy that conflict with Oumi's requirements, but the code runs fine with newer versions. For uv users, override-dependencies in pyproject.toml resolves this automatically. For pip users, the workaround is documented in the extra definition.

What's included

Source — 17 new files following Oumi conventions (copyright headers, dataclass configs, try/except optional imports, builder pattern, device cleanup, distributed support, telemetry).

Tests — 59 tests across 3 files:

  • test_aide_params.py — param validation, YAML roundtrip, CLI overrides
  • test_aide_optimizer.py — task descriptions for all 4 surfaces, config conversion, optimizer lifecycle
  • test_cli_aide.py — CLI with mocked AIDE runs

Notebooks — 3 tutorials:

  • AIDE Agentic Optimization — CONFIG_SEARCH, the main tutorial
  • AIDE Reward Function Design — tests existing reward as baseline, lets AIDE redesign it
  • AIDE Custom Evaluation — dataset exploration, eval pipeline breakdown, AIDE generation

Config and examples:

  • configs/recipes/smollm/aide/135m/aide.yaml with smollm-135m alias
  • scripts/examples/aide/run_aide_optimization.py

How to test

# Install
uv pip install -e ".[aide]"

# Unit tests (no GPU or API key needed)
pytest tests/unit/core/configs/params/test_aide_params.py tests/unit/core/agentic/test_aide_optimizer.py tests/unit/cli/test_cli_aide.py -v

# CLI
oumi aide --help

# End-to-end (requires LLM access + GPU)
oumi aide -c smollm-135m --aide.steps=5

Related issues

New feature; no existing issue.

Before submitting

  • This PR only changes documentation.
  • Did you read the contributor guideline Pull Request guidelines?
  • Did you link the issue(s) related to this PR in the section above?
  • Did you add / update tests where needed?

Introduce configuration dataclasses for AIDE ML integration, following
the exact same pattern as Optuna/TuningParams integration:
- AideParams, AideLLMParams, AideSearchParams, AideExecParams in
  core/configs/params/aide_params.py (mirrors tuning_params.py)
- AideConfig in core/configs/aide_config.py (mirrors tuning_config.py)
- AideOptimizationSurface enum for 4 surfaces: CONFIG_SEARCH,
  REWARD_FUNCTION, EVAL_FUNCTION, FULL_PIPELINE
- Exports from core/configs/__init__.py
- AliasType.AIDE for CLI config resolution
- 32 unit tests covering validation, YAML roundtrip, CLI overrides
Core integration layer following the Optuna/BaseTuner pattern:
- BaseAgenticOptimizer ABC in core/agentic/ (mirrors BaseTuner)
- AideOptimizer with try/except import, wraps AIDE Agent/Journal/Interpreter
- build_agentic_optimizer() factory in builders/agentic.py (mirrors build_tuner)
- aide.py top-level orchestration with full lifecycle: dir creation, logging
  setup, telemetry, device info, search loop, distributed cleanup
- CLI command oumi aide with device_cleanup and limit_per_process_memory
- aide() exported from oumi.__init__ (mirrors tune())
- AideResult dataclass for structured optimization results
- aide optional extra in pyproject.toml (mirrors tune extra for optuna)
- All 4 surfaces: CONFIG_SEARCH, REWARD_FUNCTION, EVAL_FUNCTION, FULL_PIPELINE
- SmolLM 135M AIDE recipe: configs/recipes/smollm/aide/135m/aide.yaml
- smollm-135m alias with AliasType.AIDE for CLI config resolution
- Example script: scripts/examples/aide/run_aide_optimization.py
  demonstrating programmatic AIDE usage (mirrors custom_evaluation.py)
- 13 unit tests for AideOptimizer: task description building for all
  4 surfaces, OmegaConf config conversion, optimizer init/cleanup,
  search summary, empty journal handling, AideResult construction
- pytest.skip pattern for optional aideml dependency (mirrors optuna tests)
- Fixed empty journal handling in get_best_solution()
- Add tests/unit/cli/test_cli_aide.py with 5 CLI tests (mirrors
  test_cli_synth.py pattern: basic run, overrides, result display,
  missing config error, oumi:// prefix resolution)
- Add complete_aide_config() to cli/completions.py for shell tab completion
- Wire autocompletion to CLI config option
- Fix get_search_summary() crash on empty journal (try/except + type cast)
- 50 tests passing across params, optimizer, and CLI test files
…nts: - Workspace helper (oumi_helper.py) with run_trial(), test_reward(), test_eval(), get_config_fields() — agent imports these instead of raw Oumi API, eliminating pad_token/dataclass/path bugs - Copy base config YAML into workspace with multi-root path resolution - Fix isatty crash (OUMI_DISABLE_RICH_LOGGING + try/except in logging.py) - Fix save_run Path crash, missing OmegaConf fields (exp_name etc.) - Restore env var in cleanup(), add error logging, fix type hints - Detailed step logging: action, plan, error output, analysis - uv override-dependencies for clean aideml>=0.2.2 install
@gitar-bot
Copy link
Copy Markdown

gitar-bot bot commented Mar 15, 2026

Important

Upgrade your plan to unlock code review, CI analysis, custom rules, and more.

@efsiatras efsiatras marked this pull request as ready for review March 15, 2026 16:08
Comment thread src/oumi/core/agentic/workspace_helper.py Outdated
Comment thread src/oumi/core/agentic/workspace_helper.py
@gitar-bot
Copy link
Copy Markdown

gitar-bot bot commented Apr 11, 2026

Gitar is working

Gitar

@oelachqar
Copy link
Copy Markdown
Contributor

Hi @efsiatras,

Thank you so much for the contribution! Were you able to use the integration? It would be interesting to have some results demonstrating the effectiveness for LLM fine-tuning

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants