SakanaAI · GeorgeWingg · Sep 25, 2025 · Sep 25, 2025 · Sep 25, 2025 · Sep 26, 2025
diff --git a/.gitignore b/.gitignore
@@ -173,3 +173,6 @@ cython_debug/
 
 # PyPI configuration file
 .pypirc
+results/
+examples/boids_flocking/metrics.json
+examples/boids_flocking/correct.json
diff --git a/README.md b/README.md
@@ -7,16 +7,16 @@
   <img src="https://img.shields.io/badge/python-%3E%3D3.10-blue" />
   <a href="https://github.com/SakanaAI/ShinkaEvolve/blob/master/LICENSE.md"><img src="https://img.shields.io/badge/license-Apache2.0-blue.svg" /></a>
   <a href="https://github.com/astral-sh/ruff"><img src="https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json" /></a>
-  <a href="http://arxiv.org/abs/2212.04180"><img src="http://img.shields.io/badge/paper-arxiv.2212.04180-B31B1B.svg" /></a>
+  <a href="http://arxiv.org/abs/2509.19349"><img src="http://img.shields.io/badge/paper-arxiv.2509.19349-B31B1B.svg" /></a>
   <a href="https://colab.research.google.com/github/SakanaAI/ShinkaEvolve/blob/main/examples/shinka_tutorial.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" /></a>
 </p>
 
 
-`ShinkaEvolve` is a framework that combines Large Language Models (LLMs) with evolutionary algorithms to drive scientific discovery. By leveraging the creative capabilities of LLMs and the optimization power of evolutionary search, `ShinkaEvolve` enables automated exploration and improvement of scientific code. The system is inspired by the [AI Scientist](https://sakana.ai/ai-scientist/), [AlphaEvolve](https://deepmind.google/discover/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/) and the [Darwin Goedel Machine](https://sakana.ai/dgm/): It maintains a population of programs that evolve over generations, with an ensemble of LLMs acting as intelligent mutation operators that suggest code improvements.
+[`ShinkaEvolve`](https://arxiv.org/abs/2509.19349) is a framework that combines Large Language Models (LLMs) with evolutionary algorithms to drive scientific discovery. By leveraging the creative capabilities of LLMs and the optimization power of evolutionary search, `ShinkaEvolve` enables automated exploration and improvement of scientific code. The system is inspired by the [AI Scientist](https://sakana.ai/ai-scientist/), [AlphaEvolve](https://deepmind.google/discover/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/) and the [Darwin Goedel Machine](https://sakana.ai/dgm/): It maintains a population of programs that evolve over generations, with an ensemble of LLMs acting as intelligent mutation operators that suggest code improvements.
 
 The framework supports **parallel evaluation of candidates** locally or on a Slurm cluster. It maintains an archive of successful solutions, enabling knowledge transfer between different evolutionary islands. `ShinkaEvolve` is particularly well-suited for scientific tasks where there is a verifier available and the goal is to optimize performance metrics while maintaining code correctness and readability.
 
-![](docs/conceptual.png)
+![evolution](https://github.com/user-attachments/assets/22cf3468-17fe-4995-9e13-d602b490a54e)
 
 ## Documentation 📝
 
@@ -26,6 +26,7 @@ The framework supports **parallel evaluation of candidates** locally or on a Slu
 | 📓 **[Tutorial Notebook](examples/shinka_tutorial.ipynb)** | Interactive walkthrough of Shinka features | Hands-on examples, configuration, best practices |
 | ⚙️ **[Configuration](docs/configuration.md)** | Comprehensive configuration reference | All config options, optimization settings, advanced features |
 | 🎨 **[WebUI](docs/webui.md)** | Interactive visualization and monitoring | Real-time tracking, result analysis, debugging tools | 
+|🕹️ **[Local LLM Support](https://github.com/SakanaAI/ShinkaEvolve/blob/main/docs/support_local_llm.md)**| Instructions for Local LLMs | How to setup local LLMs on your machine|
 
 ## Installation & Quick Start 🚀
 
@@ -52,9 +53,9 @@ For detailed installation instructions and usage examples, see the [Getting Star
 | Example | Description | Environment Setup |
 |---------|-------------|-------------------|
 | ⭕ [Circle Packing](examples/circle_packing) | Optimize circle packing to maximize radii. | `LocalJobConfig` |
-| 🤖 [Agent Design](examples/agent_design) | Design agent scaffolds for math tasks. | `LocalJobConfig` |
+| 🤖 [Agent Design](examples/adas_aime) | Design agent scaffolds for math tasks. | `LocalJobConfig` |
 | 🎯 [ALE-Bench](examples/ale_bench) | Code optimization for ALE-Bench tasks. | `LocalJobConfig` |
-| ✨ [Novelty Generator](examples/novelty_generator_bck) | Generate creative, surprising outputs (e.g., ASCII art). | `LocalJobConfig` |
+| ✨ [Novelty Generator](examples/novelty_generator) | Generate creative, surprising outputs (e.g., ASCII art). | `LocalJobConfig` |
 
 
 ## `shinka` Run with Python API 🐍
@@ -308,9 +309,9 @@ If you use `ShinkaEvolve` in your research, please cite it as follows:
 
 ```
 @article{lange2025shinka,
-  title={ShinkaEvolve: Towards Open-Ended and Sample-Efficient Program Evolution},
+  title={ShinkaEvolve: Towards Open-Ended And Sample-Efficient Program Evolution},
   author={Lange, Robert Tjarko and Imajuku, Yuki and Cetin, Edoardo},
-  journal={arXiv preprint},
+  journal={arXiv preprint arXiv:2509.19349},
   year={2025}
 }
-```
+```
diff --git a/configs/cluster/local.yaml b/configs/cluster/local.yaml
@@ -1,6 +1,7 @@
 job_config:
   _target_: shinka.launch.LocalJobConfig
   eval_program_path: ${distributed_job_config.eval_program_path}
-
+  eval_command: ${oc.select:distributed_job_config.eval_command,null}
+
 evo_config:
   job_type: "local"
diff --git a/configs/config.yaml b/configs/config.yaml
@@ -2,9 +2,9 @@ defaults:
   - _self_
   - database@_global_: island_small
   - evolution@_global_: small_budget
-  - task@_global_: mad_tf
+  - task@_global_: circle_packing
   - cluster@_global_: local
-  - variant@_global_: mad_tf_example
+  - variant@_global_: circle_packing_example
 
 verbose: false
 results_dir: results

diff --git a/configs/evolution/agentic.yaml b/configs/evolution/agentic.yaml
@@ -0,0 +1,45 @@
+evo_config:
+  _target_: shinka.core.EvolutionConfig
+  agentic_mode: true
+  # LLM models for patch generation (used by bandit sampling)
+  llm_models:
+    - "gpt-4.1"
+    - "claude-sonnet-4-20250514"
+    - "gemini-2.5-flash"
+  llm_dynamic_selection: ucb
+  embedding_model: "text-embedding-3-small"
+  num_generations: 2
+  max_parallel_jobs: 1
+  agentic:
+    _target_: shinka.core.runner.AgenticConfig
+    backend: "shinka"
+    cli_profile: null
+    sandbox: "workspace-write"
+    approval_mode: "full-auto"
+    max_turns: 50
+    max_seconds: 0
+    cli_path: null
+    extra_cli_config:
+      # Model used for agentic editing sessions
+      # REQUIRED: Will fail if not set (no silent fallbacks to old models)
+      model: "gpt-4.1"
+    resume_parent_session: false
+    # Use /tmp to isolate scratch dirs from git repos, preventing Codex CLI
+    # from discovering parent AGENTS.md files. Set to null to use results_dir.
+    scratch_dir_base: "/tmp/shinka_scratch"
+  evaluator:
+    _target_: shinka.core.runner.EvaluatorConfig
+    mode: auto
+    agentic:
+      _target_: shinka.core.runner.AgenticEvaluatorConfig
+      # If null, inherits backend from agentic.backend
+      backend: null
+      sandbox: "workspace-write"
+      approval_mode: "full-auto"
+      max_events: 80
+      max_seconds: 0
+      extra_cli_config:
+        model: "gpt-4.1"
+      # Custom evaluation criteria (null for default quantitative eval)
+      eval_prompt: null
+  results_dir: ${output_dir}
diff --git a/configs/task/boids_flocking.yaml b/configs/task/boids_flocking.yaml
@@ -0,0 +1,55 @@
+# Boids Flocking Task Configuration
+# Task: Evolve flocking behavior to minimize collisions while maintaining tight grouping
+
+# Task metadata (used by UI/logging)
+task:
+  task_name: boids_flocking
+  description: |
+    Optimize the Boids flocking simulation. The goal is to evolve the separation,
+    alignment, and cohesion behaviors to:
+    1. Minimize collisions between boids
+    2. Maintain tight grouping (cohesion)
+    3. Achieve good velocity alignment
+
+    The simulation runs for 1000 steps with 50 boids. Improve the scoring function,
+    behavior weights, and physics parameters to achieve a higher combined score.
+  exec_fname: main.py
+  init_support_dir: examples/boids_flocking
+  language: python
+  metrics_fname: metrics.json
+  correct_fname: correct.json
+  score_key: combined_score
+  higher_is_better: true
+  allowed_files:
+    - boid.py
+    - simulation.py
+    - render.py
+    - main.py
+  primary_file: main.py
+
+# Evolution config overrides (merged into global evo_config)
+evo_config:
+  init_program_path: "examples/boids_flocking/main.py"
+  task_sys_msg: |
+    You are an expert in emergent behavior simulation and evolutionary algorithms.
+    Optimize the Boids flocking simulation to achieve:
+    1. Minimize collisions between boids (separation)
+    2. Maintain tight grouping (cohesion)
+    3. Achieve good velocity alignment
+
+    The simulation runs 1000 steps with 50 boids. You can edit multiple files:
+    - main.py: Entry point and configuration
+    - boid.py: Individual boid behavior
+    - simulation.py: Simulation loop and physics
+    - render.py: Visualization (optional)
+
+    Focus on tuning behavior weights, perception radius, and force calculations.
+  language: python
+  init_support_dir: examples/boids_flocking
+  job_type: local
+
+distributed_job_config:
+  eval_program_path: "examples/boids_flocking/main.py"
+  # Don't set eval_command - let framework pass --results_dir dynamically
+
+exp_name: shinka_boids_flocking
diff --git a/configs/task/circle_packing.yaml b/configs/task/circle_packing.yaml
@@ -30,6 +30,8 @@ evo_config:
     7. The math literature suggests special arrangements for specific values of n
 
     Be creative and try to find a new solution.
+
+    IMPORTANT: Your solution must be in main.py - this is the file that gets evaluated.
   language: "python"
   init_program_path: "examples/circle_packing/initial.py"
   job_type: "slurm_conda"

diff --git a/configs/variant/boids_flocking.yaml b/configs/variant/boids_flocking.yaml
@@ -0,0 +1,13 @@
+# Variant configuration for Boids Flocking task
+# This defines default overrides for the boids task
+
+defaults:
+  - /task: boids_flocking
+  - /evolution: small_budget
+
+variant_suffix: "_boids"
+
+# Task-specific evolution overrides
+evo_config:
+  # Enable agentic mode for multi-file editing
+  agentic_mode: false  # Set to true for agentic experiments
diff --git a/configs/variant/boids_flocking_agentic.yaml b/configs/variant/boids_flocking_agentic.yaml
@@ -0,0 +1,74 @@
+# Variant configuration for Boids Flocking task with agentic editing
+# This enables the multi-turn agentic backend for multi-file evolution
+
+defaults:
+  - override /task@_global_: boids_flocking
+  - override /evolution@_global_: agentic
+
+variant_suffix: "_boids_agentic"
+exp_name: "shinka_boids_flocking"
+
+# Override evo_config with boids-specific values (applied last)
+evo_config:
+  init_program_path: "examples/boids_flocking/main.py"
+  init_support_dir: examples/boids_flocking
+  max_score: 100.0
+  num_generations: 30
+  max_parallel_jobs: 2
+  llm_models:
+    - "gemini-2.5-flash"
+  agentic:
+    extra_cli_config:
+      model: "gemini-2.5-flash"
+  task_sys_msg: |
+    You are an expert in emergent behavior simulation and evolutionary algorithms.
+    Optimize the Boids flocking simulation to achieve beautiful, natural flocking behavior.
+
+    The simulation runs 1000 steps with 50 boids. You can edit multiple files:
+    - main.py: Entry point and configuration
+    - boid.py: Individual boid behavior
+    - simulation.py: Simulation loop and physics
+    - render.py: Visualization (optional)
+
+    Focus on creating emergent patterns, smooth motion, and natural group dynamics.
+  evaluator:
+    agentic:
+      extra_cli_config:
+        model: "gemini-2.5-flash"
+      eval_prompt: |
+        Evaluate this boids simulation using BOTH quantitative metrics AND code quality.
+
+        ## Part 1: Performance Metrics (0-50 points)
+        Run the simulation and read the ACTUAL metrics from stdout.
+
+        **Collision Avoidance** (0-20 points):
+        - 0 collisions = 20 pts | <100 = 15 pts | <500 = 10 pts | <1000 = 5 pts | >=1000 = 0 pts
+
+        **Alignment** (0-15 points): Read final alignment_score (0.0-1.0)
+        - >=0.95 = 15 pts | >=0.85 = 12 pts | >=0.70 = 8 pts | <0.70 = 4 pts
+
+        **Cohesion** (0-15 points): Read final cohesion_score (0.0-1.0)
+        - >=0.70 = 15 pts | >=0.50 = 12 pts | >=0.30 = 8 pts | <0.30 = 4 pts
+
+        ## Part 2: Solution Quality (0-50 points)
+        Review the code in boid.py, simulation.py, and main.py.
+
+        **Algorithm Elegance** (0-20 points):
+        - Novel/creative approach to flocking behavior?
+        - Clean separation of concerns?
+        - Efficient force calculations?
+        - Smart use of spatial partitioning or other optimizations?
+
+        **Parameter Tuning** (0-15 points):
+        - Well-reasoned weight values for separation/alignment/cohesion?
+        - Appropriate perception/separation radii?
+        - Good balance between stability and responsiveness?
+
+        **Code Quality** (0-15 points):
+        - Readable and well-structured?
+        - No hacky workarounds or magic numbers without explanation?
+        - Would this scale to more boids?
+
+        IMPORTANT: Base performance scores on ACTUAL simulation output, not guesses.
+        combined_score = Part 1 + Part 2 (0-100)
+        correct = true if simulation runs without crashes
diff --git a/configs/variant/circle_packing_agentic.yaml b/configs/variant/circle_packing_agentic.yaml
@@ -0,0 +1,26 @@
+# Variant configuration for Circle Packing task with agentic editing
+# This enables the multi-turn agentic backend for evolution
+
+defaults:
+  - override /database@_global_: island_large
+  - override /task@_global_: circle_packing
+  - override /evolution@_global_: agentic
+  - override /cluster@_global_: local
+
+variant_suffix: "_agentic"
+exp_name: "shinka_circle_packing"
+
+# Override evo_config with agentic-specific values for circle packing
+evo_config:
+  num_generations: 50
+  max_parallel_jobs: 4
+  llm_models:
+    - "gemini-2.5-flash"
+  llm_dynamic_selection: ucb
+  # Override agentic model settings
+  agentic:
+    extra_cli_config:
+      model: "gemini-2.5-flash"
+  # Use legacy evaluator for circle packing (deterministic metric: sum of radii)
+  evaluator:
+    mode: legacy