Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
86 commits
Select commit Hold shift + click to select a range
1b4c179
Update README.md with arxiv
RobertTLange Sep 25, 2025
2fb7548
add google gemini embeding model
takeruhukushima Sep 25, 2025
27af71c
fix: Fix database summary when patch_name metadata is missing
dexhunter Sep 25, 2025
9586cdb
Update README.md
RobertTLange Sep 26, 2025
396c66a
Merge pull request #2 from dexhunter/fix/display
RobertTLange Sep 26, 2025
a60bc9e
docs: change repo name on the onboarding doc
Koki-Kazaore Sep 28, 2025
0003552
Update README
Aladoro Sep 28, 2025
be2e203
Added a doc explaining how to add suport for a local LLM and embeddin…
vicruz99 Oct 12, 2025
bf0c1d4
Add rust to supported languages
LiaCastaneda Oct 13, 2025
77d1819
Ensure setuptools discovers subpackages
iwiwi Oct 14, 2025
929f072
Mark shinka.webui as a package
iwiwi Oct 14, 2025
59a338c
Merge pull request #18 from SakanaAI/fix-packaging
RobertTLange Oct 15, 2025
23ace36
fix apply_full.py when the patch has incomplete (0,1) markers instead…
51616 Oct 24, 2025
06209a2
Merge pull request #21 from 51616/fix-full-patch-no-markers-bug
RobertTLange Oct 27, 2025
c9c468b
Merge pull request #12 from vicruz99/feature/local-models
RobertTLange Oct 27, 2025
c5b1abe
Update README.md
RobertTLange Oct 27, 2025
ccc1326
Merge branch 'main' into lia/add-support-for-rust
RobertTLange Oct 27, 2025
e8ef6de
Merge pull request #15 from LiaCastaneda/lia/add-support-for-rust
RobertTLange Oct 27, 2025
d2211b2
Merge pull request #7 from Koki-Kazaore/docs/change_repo_name
RobertTLange Oct 27, 2025
ded4576
Update inspirations.py - archive
RobertTLange Oct 27, 2025
7ceea8c
Merge pull request #1 from takeruhukushima/main
RobertTLange Oct 27, 2025
ee6e8a5
Update dependencies gemini embed
RobertTLange Oct 27, 2025
a759778
Update dbase.py path default
RobertTLange Oct 30, 2025
c097a88
Fix reasoning token sampling
RobertTLange Oct 30, 2025
6d5e208
Fix anthropic budget sampling
RobertTLange Oct 30, 2025
9b4d7c7
fix shinka_launch --help
RobertTLange Nov 2, 2025
d7a3f7e
fix wrap_eval catch
RobertTLange Nov 2, 2025
397e0fd
add documentation for resuming experiments
RobertTLange Nov 2, 2025
f6896dc
fix OAI dependency db for visualization
RobertTLange Nov 2, 2025
94a2805
Merge pull request #28 from SakanaAI/fix_minor
RobertTLange Nov 2, 2025
1d9d498
Fix init program island copying -> archive
RobertTLange Nov 2, 2025
2f01b3e
fix:GEMINI_API_KEY name error
takeruhukushima Nov 3, 2025
12738f2
Merge pull request #29 from takeruhukushima/rename_gemini_api
RobertTLange Nov 3, 2025
f5f7e68
use dependency-groups.dev
ifsheldon Nov 8, 2025
14739fc
Add support for Claude Sonnet 4.5 (claude-sonnet-4-5-20250929)
arun-pathiban-ddog Nov 8, 2025
7dd7245
Merge pull request #35 from ifsheldon/dev-group
RobertTLange Nov 8, 2025
4f0708b
Merge pull request #36 from arun-pathiban-ddog/add-claude-sonnet-4.5-…
RobertTLange Nov 8, 2025
ed9f51f
Add Swift language support
jeethu Nov 3, 2025
0437118
ignore warning for correct behavior when no improvement is detected, …
Aladoro Nov 11, 2025
831ddf6
Merge pull request #40 from SakanaAI/ignore-logsubtract-warning
RobertTLange Nov 11, 2025
259e786
Allow boolean flags for eval jobs
jm424 Nov 12, 2025
8a615a4
Merge pull request #41 from jm424/jai/allow_eval_job_bool_flags
RobertTLange Nov 13, 2025
3251a70
Add json support
jeremycochoy Nov 17, 2025
1ac33cc
Merge pull request #46 from jeremycochoy/feature/json_support
RobertTLange Nov 19, 2025
3fb579c
Merge branch 'main' into jeethu/swift
RobertTLange Nov 19, 2025
929090c
Merge pull request #37 from jeethu/jeethu/swift
RobertTLange Nov 19, 2025
ed8f1b4
llm: Add GPT-5.1 and Gemini 3 Pro models
jm424 Nov 19, 2025
70e485f
Merge pull request #48 from jm424/jai/add-newer-models
RobertTLange Nov 20, 2025
ecf762b
Update README.md
RobertTLange Nov 22, 2025
c686d7f
Update getting_started.md
RobertTLange Nov 22, 2025
bad5b37
Update apply_diff.py
RobertTLange Dec 3, 2025
e12fe6b
feat: Agentic backend core and routing logic
GeorgeWingg Dec 7, 2025
bd46743
feat: Add multi-file diff viewer and agentic node indicator
GeorgeWingg Dec 14, 2025
729ac1a
feat: Add Boids Flocking multi-file example
GeorgeWingg Dec 7, 2025
e7faefe
fix: Remove embedded script tag breaking HTML parser
GeorgeWingg Dec 14, 2025
15d579f
fix: Align TerminalRenderer signature with MatplotlibRenderer
GeorgeWingg Dec 7, 2025
ea6e91e
fix: harden agentic backends and config
GeorgeWingg Dec 14, 2025
23915e0
feat: codex headless auth (device + api key)
GeorgeWingg Dec 14, 2025
a860e08
fix: prefer subscription auth for codex
GeorgeWingg Dec 14, 2025
ec6307e
fix: correct embedding corpus args for agentic files
GeorgeWingg Dec 14, 2025
810e318
feat: propagate multi-file workspace between generations
GeorgeWingg Dec 14, 2025
1fda8e3
fix: hydrate workspace for legacy multi-file patches
GeorgeWingg Dec 14, 2025
6639b62
feat: integrate bandit sampling with agentic mode
GeorgeWingg Dec 14, 2025
fdee648
feat: add boids_flocking_agentic variant and fix config merging
GeorgeWingg Dec 14, 2025
05c6313
chore: add gpt-5.2 pricing entry and PR validation plan
GeorgeWingg Dec 15, 2025
3efa551
style: apply black/isort formatting to changed files
GeorgeWingg Dec 15, 2025
af31cf7
fix: agentic prompt architecture - CLI harness owns system prompt
GeorgeWingg Dec 15, 2025
7e4b3f4
fix: handle missing EVOLVE-BLOCK markers in embedding
GeorgeWingg Dec 15, 2025
b5a34c6
fix: fail loudly when no model configured instead of silent fallback
GeorgeWingg Dec 15, 2025
700575d
fix: add logging for silent fallbacks in cost, credentials, embedding
GeorgeWingg Dec 15, 2025
0946ee4
docs: update EXECPLAN with silent fallback fixes
GeorgeWingg Dec 15, 2025
a54e3cc
fix: full parallelism for agentic mode - thread-safe job submission
GeorgeWingg Dec 17, 2025
fc71a31
feat: add circle_packing_agentic variant config
GeorgeWingg Dec 17, 2025
20d01c5
fix: enable parallelism with legacy evaluator
GeorgeWingg Dec 17, 2025
1a08a6a
fix: correct flag not being stored in agentic evaluator
GeorgeWingg Dec 17, 2025
0cf887c
fix: execute all bash blocks in agent responses
GeorgeWingg Dec 17, 2025
5172385
fix: Codex backend event limit and DictConfig serialization bugs
GeorgeWingg Dec 18, 2025
6577b8c
chore: remove dead code from embedding_corpus.py
GeorgeWingg Dec 18, 2025
b5fcd5f
chore: remove unused session registry module
GeorgeWingg Dec 18, 2025
d80bff2
chore: remove PR planning document
GeorgeWingg Dec 18, 2025
36c448d
chore: remove unused TerminalRenderer from boids example
GeorgeWingg Dec 18, 2025
71e8cd3
chore: remove duplicate PROVIDER_ENV_VAR_MAP from shinka_agent
GeorgeWingg Dec 18, 2025
4dde4d8
feat: add agentic test suite and config cleanup
GeorgeWingg Dec 18, 2025
92dbada
fix: correct import order in codex_cli.py
GeorgeWingg Dec 18, 2025
8390cf3
fix: use available model names in agentic configs
GeorgeWingg Dec 18, 2025
3b9ad16
feat: add multi-file embedding corpus support for novelty detection
GeorgeWingg Dec 18, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -173,3 +173,6 @@ cython_debug/

# PyPI configuration file
.pypirc
results/
examples/boids_flocking/metrics.json
examples/boids_flocking/correct.json
17 changes: 9 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,16 +7,16 @@
<img src="https://img.shields.io/badge/python-%3E%3D3.10-blue" />
<a href="https://github.com/SakanaAI/ShinkaEvolve/blob/master/LICENSE.md"><img src="https://img.shields.io/badge/license-Apache2.0-blue.svg" /></a>
<a href="https://github.com/astral-sh/ruff"><img src="https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json" /></a>
<a href="http://arxiv.org/abs/2212.04180"><img src="http://img.shields.io/badge/paper-arxiv.2212.04180-B31B1B.svg" /></a>
<a href="http://arxiv.org/abs/2509.19349"><img src="http://img.shields.io/badge/paper-arxiv.2509.19349-B31B1B.svg" /></a>
<a href="https://colab.research.google.com/github/SakanaAI/ShinkaEvolve/blob/main/examples/shinka_tutorial.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" /></a>
</p>


`ShinkaEvolve` is a framework that combines Large Language Models (LLMs) with evolutionary algorithms to drive scientific discovery. By leveraging the creative capabilities of LLMs and the optimization power of evolutionary search, `ShinkaEvolve` enables automated exploration and improvement of scientific code. The system is inspired by the [AI Scientist](https://sakana.ai/ai-scientist/), [AlphaEvolve](https://deepmind.google/discover/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/) and the [Darwin Goedel Machine](https://sakana.ai/dgm/): It maintains a population of programs that evolve over generations, with an ensemble of LLMs acting as intelligent mutation operators that suggest code improvements.
[`ShinkaEvolve`](https://arxiv.org/abs/2509.19349) is a framework that combines Large Language Models (LLMs) with evolutionary algorithms to drive scientific discovery. By leveraging the creative capabilities of LLMs and the optimization power of evolutionary search, `ShinkaEvolve` enables automated exploration and improvement of scientific code. The system is inspired by the [AI Scientist](https://sakana.ai/ai-scientist/), [AlphaEvolve](https://deepmind.google/discover/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/) and the [Darwin Goedel Machine](https://sakana.ai/dgm/): It maintains a population of programs that evolve over generations, with an ensemble of LLMs acting as intelligent mutation operators that suggest code improvements.

The framework supports **parallel evaluation of candidates** locally or on a Slurm cluster. It maintains an archive of successful solutions, enabling knowledge transfer between different evolutionary islands. `ShinkaEvolve` is particularly well-suited for scientific tasks where there is a verifier available and the goal is to optimize performance metrics while maintaining code correctness and readability.

![](docs/conceptual.png)
![evolution](https://github.com/user-attachments/assets/22cf3468-17fe-4995-9e13-d602b490a54e)

## Documentation 📝

Expand All @@ -26,6 +26,7 @@ The framework supports **parallel evaluation of candidates** locally or on a Slu
| 📓 **[Tutorial Notebook](examples/shinka_tutorial.ipynb)** | Interactive walkthrough of Shinka features | Hands-on examples, configuration, best practices |
| ⚙️ **[Configuration](docs/configuration.md)** | Comprehensive configuration reference | All config options, optimization settings, advanced features |
| 🎨 **[WebUI](docs/webui.md)** | Interactive visualization and monitoring | Real-time tracking, result analysis, debugging tools |
|🕹️ **[Local LLM Support](https://github.com/SakanaAI/ShinkaEvolve/blob/main/docs/support_local_llm.md)**| Instructions for Local LLMs | How to setup local LLMs on your machine|

## Installation & Quick Start 🚀

Expand All @@ -52,9 +53,9 @@ For detailed installation instructions and usage examples, see the [Getting Star
| Example | Description | Environment Setup |
|---------|-------------|-------------------|
| ⭕ [Circle Packing](examples/circle_packing) | Optimize circle packing to maximize radii. | `LocalJobConfig` |
| 🤖 [Agent Design](examples/agent_design) | Design agent scaffolds for math tasks. | `LocalJobConfig` |
| 🤖 [Agent Design](examples/adas_aime) | Design agent scaffolds for math tasks. | `LocalJobConfig` |
| 🎯 [ALE-Bench](examples/ale_bench) | Code optimization for ALE-Bench tasks. | `LocalJobConfig` |
| ✨ [Novelty Generator](examples/novelty_generator_bck) | Generate creative, surprising outputs (e.g., ASCII art). | `LocalJobConfig` |
| ✨ [Novelty Generator](examples/novelty_generator) | Generate creative, surprising outputs (e.g., ASCII art). | `LocalJobConfig` |


## `shinka` Run with Python API 🐍
Expand Down Expand Up @@ -308,9 +309,9 @@ If you use `ShinkaEvolve` in your research, please cite it as follows:

```
@article{lange2025shinka,
title={ShinkaEvolve: Towards Open-Ended and Sample-Efficient Program Evolution},
title={ShinkaEvolve: Towards Open-Ended And Sample-Efficient Program Evolution},
author={Lange, Robert Tjarko and Imajuku, Yuki and Cetin, Edoardo},
journal={arXiv preprint},
journal={arXiv preprint arXiv:2509.19349},
year={2025}
}
```
```
3 changes: 2 additions & 1 deletion configs/cluster/local.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
job_config:
_target_: shinka.launch.LocalJobConfig
eval_program_path: ${distributed_job_config.eval_program_path}

eval_command: ${oc.select:distributed_job_config.eval_command,null}

evo_config:
job_type: "local"
4 changes: 2 additions & 2 deletions configs/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@ defaults:
- _self_
- database@_global_: island_small
- evolution@_global_: small_budget
- task@_global_: mad_tf
- task@_global_: circle_packing
- cluster@_global_: local
- variant@_global_: mad_tf_example
- variant@_global_: circle_packing_example

verbose: false
results_dir: results
Expand Down
45 changes: 45 additions & 0 deletions configs/evolution/agentic.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
evo_config:
_target_: shinka.core.EvolutionConfig
agentic_mode: true
# LLM models for patch generation (used by bandit sampling)
llm_models:
- "gpt-4.1"
- "claude-sonnet-4-20250514"
- "gemini-2.5-flash"
llm_dynamic_selection: ucb
embedding_model: "text-embedding-3-small"
num_generations: 2
max_parallel_jobs: 1
agentic:
_target_: shinka.core.runner.AgenticConfig
backend: "shinka"
cli_profile: null
sandbox: "workspace-write"
approval_mode: "full-auto"
max_turns: 50
max_seconds: 0
cli_path: null
extra_cli_config:
# Model used for agentic editing sessions
# REQUIRED: Will fail if not set (no silent fallbacks to old models)
model: "gpt-4.1"
resume_parent_session: false
# Use /tmp to isolate scratch dirs from git repos, preventing Codex CLI
# from discovering parent AGENTS.md files. Set to null to use results_dir.
scratch_dir_base: "/tmp/shinka_scratch"
evaluator:
_target_: shinka.core.runner.EvaluatorConfig
mode: auto
agentic:
_target_: shinka.core.runner.AgenticEvaluatorConfig
# If null, inherits backend from agentic.backend
backend: null
sandbox: "workspace-write"
approval_mode: "full-auto"
max_events: 80
max_seconds: 0
extra_cli_config:
model: "gpt-4.1"
# Custom evaluation criteria (null for default quantitative eval)
eval_prompt: null
results_dir: ${output_dir}
55 changes: 55 additions & 0 deletions configs/task/boids_flocking.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# Boids Flocking Task Configuration
# Task: Evolve flocking behavior to minimize collisions while maintaining tight grouping

# Task metadata (used by UI/logging)
task:
task_name: boids_flocking
description: |
Optimize the Boids flocking simulation. The goal is to evolve the separation,
alignment, and cohesion behaviors to:
1. Minimize collisions between boids
2. Maintain tight grouping (cohesion)
3. Achieve good velocity alignment

The simulation runs for 1000 steps with 50 boids. Improve the scoring function,
behavior weights, and physics parameters to achieve a higher combined score.
exec_fname: main.py
init_support_dir: examples/boids_flocking
language: python
metrics_fname: metrics.json
correct_fname: correct.json
score_key: combined_score
higher_is_better: true
allowed_files:
- boid.py
- simulation.py
- render.py
- main.py
primary_file: main.py

# Evolution config overrides (merged into global evo_config)
evo_config:
init_program_path: "examples/boids_flocking/main.py"
task_sys_msg: |
You are an expert in emergent behavior simulation and evolutionary algorithms.
Optimize the Boids flocking simulation to achieve:
1. Minimize collisions between boids (separation)
2. Maintain tight grouping (cohesion)
3. Achieve good velocity alignment

The simulation runs 1000 steps with 50 boids. You can edit multiple files:
- main.py: Entry point and configuration
- boid.py: Individual boid behavior
- simulation.py: Simulation loop and physics
- render.py: Visualization (optional)

Focus on tuning behavior weights, perception radius, and force calculations.
language: python
init_support_dir: examples/boids_flocking
job_type: local

distributed_job_config:
eval_program_path: "examples/boids_flocking/main.py"
# Don't set eval_command - let framework pass --results_dir dynamically

exp_name: shinka_boids_flocking
2 changes: 2 additions & 0 deletions configs/task/circle_packing.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@ evo_config:
7. The math literature suggests special arrangements for specific values of n

Be creative and try to find a new solution.

IMPORTANT: Your solution must be in main.py - this is the file that gets evaluated.
language: "python"
init_program_path: "examples/circle_packing/initial.py"
job_type: "slurm_conda"
Expand Down
13 changes: 13 additions & 0 deletions configs/variant/boids_flocking.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Variant configuration for Boids Flocking task
# This defines default overrides for the boids task

defaults:
- /task: boids_flocking
- /evolution: small_budget

variant_suffix: "_boids"

# Task-specific evolution overrides
evo_config:
# Enable agentic mode for multi-file editing
agentic_mode: false # Set to true for agentic experiments
74 changes: 74 additions & 0 deletions configs/variant/boids_flocking_agentic.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# Variant configuration for Boids Flocking task with agentic editing
# This enables the multi-turn agentic backend for multi-file evolution

defaults:
- override /task@_global_: boids_flocking
- override /evolution@_global_: agentic

variant_suffix: "_boids_agentic"
exp_name: "shinka_boids_flocking"

# Override evo_config with boids-specific values (applied last)
evo_config:
init_program_path: "examples/boids_flocking/main.py"
init_support_dir: examples/boids_flocking
max_score: 100.0
num_generations: 30
max_parallel_jobs: 2
llm_models:
- "gemini-2.5-flash"
agentic:
extra_cli_config:
model: "gemini-2.5-flash"
task_sys_msg: |
You are an expert in emergent behavior simulation and evolutionary algorithms.
Optimize the Boids flocking simulation to achieve beautiful, natural flocking behavior.

The simulation runs 1000 steps with 50 boids. You can edit multiple files:
- main.py: Entry point and configuration
- boid.py: Individual boid behavior
- simulation.py: Simulation loop and physics
- render.py: Visualization (optional)

Focus on creating emergent patterns, smooth motion, and natural group dynamics.
evaluator:
agentic:
extra_cli_config:
model: "gemini-2.5-flash"
eval_prompt: |
Evaluate this boids simulation using BOTH quantitative metrics AND code quality.

## Part 1: Performance Metrics (0-50 points)
Run the simulation and read the ACTUAL metrics from stdout.

**Collision Avoidance** (0-20 points):
- 0 collisions = 20 pts | <100 = 15 pts | <500 = 10 pts | <1000 = 5 pts | >=1000 = 0 pts

**Alignment** (0-15 points): Read final alignment_score (0.0-1.0)
- >=0.95 = 15 pts | >=0.85 = 12 pts | >=0.70 = 8 pts | <0.70 = 4 pts

**Cohesion** (0-15 points): Read final cohesion_score (0.0-1.0)
- >=0.70 = 15 pts | >=0.50 = 12 pts | >=0.30 = 8 pts | <0.30 = 4 pts

## Part 2: Solution Quality (0-50 points)
Review the code in boid.py, simulation.py, and main.py.

**Algorithm Elegance** (0-20 points):
- Novel/creative approach to flocking behavior?
- Clean separation of concerns?
- Efficient force calculations?
- Smart use of spatial partitioning or other optimizations?

**Parameter Tuning** (0-15 points):
- Well-reasoned weight values for separation/alignment/cohesion?
- Appropriate perception/separation radii?
- Good balance between stability and responsiveness?

**Code Quality** (0-15 points):
- Readable and well-structured?
- No hacky workarounds or magic numbers without explanation?
- Would this scale to more boids?

IMPORTANT: Base performance scores on ACTUAL simulation output, not guesses.
combined_score = Part 1 + Part 2 (0-100)
correct = true if simulation runs without crashes
26 changes: 26 additions & 0 deletions configs/variant/circle_packing_agentic.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Variant configuration for Circle Packing task with agentic editing
# This enables the multi-turn agentic backend for evolution

defaults:
- override /database@_global_: island_large
- override /task@_global_: circle_packing
- override /evolution@_global_: agentic
- override /cluster@_global_: local

variant_suffix: "_agentic"
exp_name: "shinka_circle_packing"

# Override evo_config with agentic-specific values for circle packing
evo_config:
num_generations: 50
max_parallel_jobs: 4
llm_models:
- "gemini-2.5-flash"
llm_dynamic_selection: ucb
# Override agentic model settings
agentic:
extra_cli_config:
model: "gemini-2.5-flash"
# Use legacy evaluator for circle packing (deterministic metric: sum of radii)
evaluator:
mode: legacy
Loading