SakanaAI · Tyronita · Apr 9, 2026
diff --git a/.claude/skills/shinka-convert/SKILL.md b/.claude/skills/shinka-convert/SKILL.md
@@ -0,0 +1,170 @@
+---
+name: shinka-convert
+description: Convert an existing codebase in the current working directory into a ShinkaEvolve task directory by snapshotting the relevant code, adding evolve blocks, and generating `evaluate.py` plus Shinka runner/config files. Use when the user wants to optimize existing code with Shinka instead of creating a brand-new task from a natural-language description.
+---
+
+# Shinka Convert Skill
+Use this skill to turn an existing project into a Shinka-ready task.
+
+This is the alternative starting point to `shinka-setup`:
+- `shinka-setup`: new task from natural-language task description
+- `shinka-convert`: existing codebase to Shinka task conversion
+
+After conversion, the user should still be able to use `shinka-run`.
+
+## When to Use
+Invoke this skill when the user:
+- Wants to optimize an existing script or repo with Shinka/ShinkaEvolve
+- Mentions adapting current code to Shinka output signatures, `metrics.json`, `correct.json`, or `EVOLVE-BLOCK` markers
+- Wants a sidecar Shinka task generated from the current working directory
+
+Do not use this skill when:
+- The user wants a brand-new task scaffold from only a natural-language description
+- `evaluate.py` and `initial.<ext>` already exist and the user only wants to launch evolution; use `shinka-run`
+
+## User Inputs
+Start from freeform instructions, then ask follow-ups only if high-impact details are missing.
+
+Collect:
+- What behavior or file/function to optimize
+- Score direction and main metric
+- Constraints: correctness, runtime, memory, determinism, style, allowed edits
+- Whether original source must remain untouched
+- Any required data/assets/dependencies
+
+## Default Output
+Generate a sidecar task directory at `./shinka_task/` unless the user requests another path.
+
+The task directory should contain:
+- `evaluate.py`
+- `run_evo.py`
+- `shinka.yaml`
+- `initial.<ext>`
+- A copied snapshot of the minimal runnable source subtree needed for evaluation
+
+Do not edit the original source tree unless the user explicitly requests in-place conversion.
+
+## Workflow
+1. Inspect the current working directory.
+   - Identify language, entrypoints, package/module layout, dependencies, and current outputs.
+   - Prefer concrete evidence from the code over guesses.
+2. Infer the evolvable region from the user's instructions.
+   - If ambiguous, ask targeted follow-ups.
+   - Keep the mutable region as small as practical.
+3. Choose the minimal runnable snapshot scope.
+   - Copy only the source subtree needed to execute the task in isolation.
+   - Avoid repo-wide snapshots unless imports/runtime make that necessary.
+4. Create the sidecar task directory.
+   - Default: `./shinka_task/`
+   - Avoid overwriting an existing task dir without consent.
+5. Rewrite the snapshot into a stable Shinka contract.
+   - Preserve original behavior outside the evolvable region.
+   - Keep CLI behavior intact where practical.
+   - Ensure the evolvable candidate entry file is named `initial.<ext>` so `shinka-run` can detect it.
+   - Add tight `EVOLVE-BLOCK-START` / `EVOLVE-BLOCK-END` markers.
+6. Generate the evaluator path.
+   - Python: prefer exposing `run_experiment(...)` and use `run_shinka_eval`.
+   - Non-Python: use `subprocess` and write `metrics.json` plus `correct.json`.
+7. Generate `run_evo.py` and `shinka.yaml`.
+   - Ensure `init_program_path` and `language` match the candidate file.
+   - Keep the output directly compatible with `shinka-run`.
+8. Smoke test before handoff.
+   - Run `python evaluate.py --program_path <initial file> --results_dir /tmp/shinka_convert_smoke`
+   - Confirm evaluator runs without exceptions.
+   - Confirm required metrics/correctness outputs are written.
+9. Ask the user for the next step.
+   - Either run evolution manually
+   - Or use the `shinka-run` skill
+
+## Conversion Strategy by Language
+### Python
+- Preferred path: expose `run_experiment(...)` in the snapshot and evaluate via `run_shinka_eval`
+- If the existing code is CLI-only, add a thin wrapper in the snapshot rather than forcing a subprocess evaluator unless imports are too brittle
+- Keep imports relative to the copied task snapshot stable
+
+### Non-Python
+- Keep the candidate program executable in its own runtime
+- Use Python `evaluate.py` as the Shinka entrypoint
+- Write `metrics.json` and `correct.json` in `results_dir`
+
+## Required Evaluator Contract
+Metrics must include:
+- `combined_score`
+- `public`
+- `private`
+- `extra_data`
+- `text_feedback`
+
+Correctness must include:
+- `correct`
+- `error`
+
+Higher `combined_score` values indicate better performance unless the user explicitly defines an inverted metric that you transform during aggregation.
+
+## Python Conversion Template
+Prefer shaping the copied program like this:
+
+```py
+from __future__ import annotations
+
+# EVOLVE-BLOCK-START
+def optimize_me(...):
+    ...
+# EVOLVE-BLOCK-END
+
+def run_experiment(random_seed: int | None = None, **kwargs):
+    ...
+    return score, text_feedback
+```
+
+And the evaluator:
+
+```py
+from shinka.core import run_shinka_eval
+
+def main(program_path: str, results_dir: str):
+    metrics, correct, err = run_shinka_eval(
+        program_path=program_path,
+        results_dir=results_dir,
+        experiment_fn_name="run_experiment",
+        num_runs=3,
+        get_experiment_kwargs=get_kwargs,
+        aggregate_metrics_fn=aggregate_fn,
+        validate_fn=validate_fn,
+    )
+    if not correct:
+        raise RuntimeError(err or "Evaluation failed")
+```
+
+## Non-Python Conversion Template
+Use `evaluate.py` to run the candidate and write outputs:
+
+```py
+import json
+import os
+from pathlib import Path
+
+def main(program_path: str, results_dir: str):
+    os.makedirs(results_dir, exist_ok=True)
+    metrics = {
+        "combined_score": 0.0,
+        "public": {},
+        "private": {},
+        "extra_data": {},
+        "text_feedback": "",
+    }
+    correct = {"correct": False, "error": ""}
+
+    (Path(results_dir) / "metrics.json").write_text(json.dumps(metrics, indent=2))
+    (Path(results_dir) / "correct.json").write_text(json.dumps(correct, indent=2))
+```
+
+## Bundled Assets
+- Use `scripts/run_evo.py` as the starting runner template
+- Use `scripts/shinka.yaml` as the starting config template
+
+## Notes
+- Keep evolve regions tight; do not make the whole project mutable by default
+- Preserve correctness checks outside the evolve region where possible
+- Prefer deterministic evaluation and stable seeds
+- If the converted task is ready, offer to continue with `shinka-run`
diff --git a/.claude/skills/shinka-convert/scripts/run_evo.py b/.claude/skills/shinka-convert/scripts/run_evo.py
@@ -0,0 +1,44 @@
+#!/usr/bin/env python3
+import argparse
+import asyncio
+import yaml
+
+from shinka.core import EvolutionConfig, ShinkaEvolveRunner
+from shinka.database import DatabaseConfig
+from shinka.launch import LocalJobConfig
+
+TASK_SYS_MSG = """You are optimizing code converted from an existing codebase.
+Preserve the task contract and keep changes focused on the intended EVOLVE-BLOCK regions.
+Do not break evaluation outputs, result file schemas, or imports required by the task snapshot."""
+
+
+async def main(config_path: str):
+    with open(config_path, "r", encoding="utf-8") as f:
+        config = yaml.safe_load(f)
+
+    config["evo_config"]["task_sys_msg"] = TASK_SYS_MSG
+    evo_config = EvolutionConfig(**config["evo_config"])
+    job_config = LocalJobConfig(
+        eval_program_path="evaluate.py",
+        time="05:00:00",
+    )
+    db_config = DatabaseConfig(**config["db_config"])
+
+    runner = ShinkaEvolveRunner(
+        evo_config=evo_config,
+        job_config=job_config,
+        db_config=db_config,
+        max_evaluation_jobs=config["max_evaluation_jobs"],
+        max_proposal_jobs=config["max_proposal_jobs"],
+        max_db_workers=config["max_db_workers"],
+        debug=False,
+        verbose=True,
+    )
+    await runner.run()
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--config_path", type=str, default="shinka.yaml")
+    args = parser.parse_args()
+    asyncio.run(main(args.config_path))
diff --git a/.claude/skills/shinka-convert/scripts/shinka.yaml b/.claude/skills/shinka-convert/scripts/shinka.yaml
@@ -0,0 +1,48 @@
+max_evaluation_jobs: 5
+max_proposal_jobs: 5
+max_db_workers: 4
+
+db_config:
+  db_path: evolution_db.sqlite
+  num_islands: 2
+  archive_size: 40
+  elite_selection_ratio: 0.3
+  num_archive_inspirations: 4
+  num_top_k_inspirations: 2
+  migration_interval: 10
+  migration_rate: 0.1
+  island_elitism: true
+  enforce_island_separation: true
+  parent_selection_strategy: weighted
+  parent_selection_lambda: 10
+
+evo_config:
+  patch_types: [diff, full, cross]
+  patch_type_probs: [0.6, 0.3, 0.1]
+  num_generations: 100
+  max_api_costs: 0.1
+  max_patch_resamples: 3
+  max_patch_attempts: 3
+  max_novelty_attempts: 3
+  job_type: local
+  language: python
+  llm_models: ["gemini-3-flash-preview", "gpt-5-mini", "gpt-5-nano"]
+  llm_kwargs:
+    temperatures: [0, 0.5, 1.0]
+    reasoning_efforts: [min, low]
+    max_tokens: 32768
+  meta_rec_interval: 40
+  meta_llm_models: ["gpt-5-mini"]
+  meta_llm_kwargs:
+    temperatures: [0]
+    max_tokens: 16384
+  embedding_model: text-embedding-3-small
+  code_embed_sim_threshold: 0.99
+  novelty_llm_models: ["gpt-5-nano"]
+  novelty_llm_kwargs:
+    temperatures: [0]
+  init_program_path: initial.py
+  llm_dynamic_selection: ucb1
+  llm_dynamic_selection_kwargs:
+    exploration_coef: 1
+  results_dir: results/results_task
diff --git a/.claude/skills/shinka-inspect/SKILL.md b/.claude/skills/shinka-inspect/SKILL.md
@@ -0,0 +1,63 @@
+---
+name: shinka-inspect
+description: Load top-performing Shinka programs into agent context using `shinka.utils.load_programs_to_df`, and emit a compact Markdown bundle for iteration planning.
+---
+
+# Shinka Inspect Skill
+Extract the strongest programs from a Shinka run and package them into a context file that coding agents can load directly.
+
+## When to Use
+Use this skill when:
+- A run already produced a results directory and SQLite database
+- You want to inspect top-performing programs before launching the next batch
+- You want a compact context artifact instead of manually browsing the DB
+
+Do not use this skill when:
+- You still need to scaffold a task (`shinka-setup`)
+- You need to run evolution batches (`shinka-run`)
+
+## What it does
+- Uses `shinka.utils.load_programs_to_df` to read program records
+- Ranks programs by `combined_score`
+- Selects top-`k` correct programs (fallback to top-`k` overall if no correct rows)
+- Writes one Markdown bundle with metadata, ranking table, feedback, and code snippets
+
+## Workflow
+1. Confirm run artifacts exist
+```bash
+ls -la <results_dir>
+```
+
+2. Generate context bundle
+```bash
+python skills/shinka-inspect/scripts/inspect_best_programs.py \
+  --results-dir <results_dir> \
+  --k 5
+```
+
+3. Optional tuning knobs
+```bash
+python skills/shinka-inspect/scripts/inspect_best_programs.py \
+  --results-dir <results_dir> \
+  --k 8 \
+  --max-code-chars 5000 \
+  --min-generation 10 \
+  --out <results_dir>/inspect/top_programs.md
+```
+
+4. Load output into agent context
+- Default output path: `<results_dir>/shinka_inspect_context.md`
+- Use it as the context artifact for next-step mutation planning
+
+## CLI Arguments
+- `--results-dir`: Path to run directory (or direct DB file path)
+- `--k`: Number of programs to include (default `5`)
+- `--out`: Output markdown path (default under results dir)
+- `--max-code-chars`: Per-program code truncation cap (default `4000`)
+- `--min-generation`: Optional lower bound on generation
+- `--include-feedback` / `--no-include-feedback`: Include `text_feedback` blocks
+
+## Notes
+- Ranking metric is `combined_score`.
+- If no correct rows exist, script falls back to top-score rows and labels fallback in output.
+- Script is read-only for run artifacts (writes only the markdown bundle).