Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
170 changes: 170 additions & 0 deletions .claude/skills/shinka-convert/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,170 @@
---
name: shinka-convert
description: Convert an existing codebase in the current working directory into a ShinkaEvolve task directory by snapshotting the relevant code, adding evolve blocks, and generating `evaluate.py` plus Shinka runner/config files. Use when the user wants to optimize existing code with Shinka instead of creating a brand-new task from a natural-language description.
---

# Shinka Convert Skill
Use this skill to turn an existing project into a Shinka-ready task.

This is the alternative starting point to `shinka-setup`:
- `shinka-setup`: new task from natural-language task description
- `shinka-convert`: existing codebase to Shinka task conversion

After conversion, the user should still be able to use `shinka-run`.

## When to Use
Invoke this skill when the user:
- Wants to optimize an existing script or repo with Shinka/ShinkaEvolve
- Mentions adapting current code to Shinka output signatures, `metrics.json`, `correct.json`, or `EVOLVE-BLOCK` markers
- Wants a sidecar Shinka task generated from the current working directory

Do not use this skill when:
- The user wants a brand-new task scaffold from only a natural-language description
- `evaluate.py` and `initial.<ext>` already exist and the user only wants to launch evolution; use `shinka-run`

## User Inputs
Start from freeform instructions, then ask follow-ups only if high-impact details are missing.

Collect:
- What behavior or file/function to optimize
- Score direction and main metric
- Constraints: correctness, runtime, memory, determinism, style, allowed edits
- Whether original source must remain untouched
- Any required data/assets/dependencies

## Default Output
Generate a sidecar task directory at `./shinka_task/` unless the user requests another path.

The task directory should contain:
- `evaluate.py`
- `run_evo.py`
- `shinka.yaml`
- `initial.<ext>`
- A copied snapshot of the minimal runnable source subtree needed for evaluation

Do not edit the original source tree unless the user explicitly requests in-place conversion.

## Workflow
1. Inspect the current working directory.
- Identify language, entrypoints, package/module layout, dependencies, and current outputs.
- Prefer concrete evidence from the code over guesses.
2. Infer the evolvable region from the user's instructions.
- If ambiguous, ask targeted follow-ups.
- Keep the mutable region as small as practical.
3. Choose the minimal runnable snapshot scope.
- Copy only the source subtree needed to execute the task in isolation.
- Avoid repo-wide snapshots unless imports/runtime make that necessary.
4. Create the sidecar task directory.
- Default: `./shinka_task/`
- Avoid overwriting an existing task dir without consent.
5. Rewrite the snapshot into a stable Shinka contract.
- Preserve original behavior outside the evolvable region.
- Keep CLI behavior intact where practical.
- Ensure the evolvable candidate entry file is named `initial.<ext>` so `shinka-run` can detect it.
- Add tight `EVOLVE-BLOCK-START` / `EVOLVE-BLOCK-END` markers.
6. Generate the evaluator path.
- Python: prefer exposing `run_experiment(...)` and use `run_shinka_eval`.
- Non-Python: use `subprocess` and write `metrics.json` plus `correct.json`.
7. Generate `run_evo.py` and `shinka.yaml`.
- Ensure `init_program_path` and `language` match the candidate file.
- Keep the output directly compatible with `shinka-run`.
8. Smoke test before handoff.
- Run `python evaluate.py --program_path <initial file> --results_dir /tmp/shinka_convert_smoke`
- Confirm evaluator runs without exceptions.
- Confirm required metrics/correctness outputs are written.
9. Ask the user for the next step.
- Either run evolution manually
- Or use the `shinka-run` skill

## Conversion Strategy by Language
### Python
- Preferred path: expose `run_experiment(...)` in the snapshot and evaluate via `run_shinka_eval`
- If the existing code is CLI-only, add a thin wrapper in the snapshot rather than forcing a subprocess evaluator unless imports are too brittle
- Keep imports relative to the copied task snapshot stable

### Non-Python
- Keep the candidate program executable in its own runtime
- Use Python `evaluate.py` as the Shinka entrypoint
- Write `metrics.json` and `correct.json` in `results_dir`

## Required Evaluator Contract
Metrics must include:
- `combined_score`
- `public`
- `private`
- `extra_data`
- `text_feedback`

Correctness must include:
- `correct`
- `error`

Higher `combined_score` values indicate better performance unless the user explicitly defines an inverted metric that you transform during aggregation.

## Python Conversion Template
Prefer shaping the copied program like this:

```py
from __future__ import annotations

# EVOLVE-BLOCK-START
def optimize_me(...):
...
# EVOLVE-BLOCK-END

def run_experiment(random_seed: int | None = None, **kwargs):
...
return score, text_feedback
```

And the evaluator:

```py
from shinka.core import run_shinka_eval

def main(program_path: str, results_dir: str):
metrics, correct, err = run_shinka_eval(
program_path=program_path,
results_dir=results_dir,
experiment_fn_name="run_experiment",
num_runs=3,
get_experiment_kwargs=get_kwargs,
aggregate_metrics_fn=aggregate_fn,
validate_fn=validate_fn,
)
if not correct:
raise RuntimeError(err or "Evaluation failed")
```

## Non-Python Conversion Template
Use `evaluate.py` to run the candidate and write outputs:

```py
import json
import os
from pathlib import Path

def main(program_path: str, results_dir: str):
os.makedirs(results_dir, exist_ok=True)
metrics = {
"combined_score": 0.0,
"public": {},
"private": {},
"extra_data": {},
"text_feedback": "",
}
correct = {"correct": False, "error": ""}

(Path(results_dir) / "metrics.json").write_text(json.dumps(metrics, indent=2))
(Path(results_dir) / "correct.json").write_text(json.dumps(correct, indent=2))
```

## Bundled Assets
- Use `scripts/run_evo.py` as the starting runner template
- Use `scripts/shinka.yaml` as the starting config template

## Notes
- Keep evolve regions tight; do not make the whole project mutable by default
- Preserve correctness checks outside the evolve region where possible
- Prefer deterministic evaluation and stable seeds
- If the converted task is ready, offer to continue with `shinka-run`
44 changes: 44 additions & 0 deletions .claude/skills/shinka-convert/scripts/run_evo.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
#!/usr/bin/env python3
import argparse
import asyncio
import yaml

from shinka.core import EvolutionConfig, ShinkaEvolveRunner
from shinka.database import DatabaseConfig
from shinka.launch import LocalJobConfig

TASK_SYS_MSG = """You are optimizing code converted from an existing codebase.
Preserve the task contract and keep changes focused on the intended EVOLVE-BLOCK regions.
Do not break evaluation outputs, result file schemas, or imports required by the task snapshot."""


async def main(config_path: str):
with open(config_path, "r", encoding="utf-8") as f:
config = yaml.safe_load(f)

config["evo_config"]["task_sys_msg"] = TASK_SYS_MSG
evo_config = EvolutionConfig(**config["evo_config"])
job_config = LocalJobConfig(
eval_program_path="evaluate.py",
time="05:00:00",
)
db_config = DatabaseConfig(**config["db_config"])

runner = ShinkaEvolveRunner(
evo_config=evo_config,
job_config=job_config,
db_config=db_config,
max_evaluation_jobs=config["max_evaluation_jobs"],
max_proposal_jobs=config["max_proposal_jobs"],
max_db_workers=config["max_db_workers"],
debug=False,
verbose=True,
)
await runner.run()


if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--config_path", type=str, default="shinka.yaml")
args = parser.parse_args()
asyncio.run(main(args.config_path))
48 changes: 48 additions & 0 deletions .claude/skills/shinka-convert/scripts/shinka.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
max_evaluation_jobs: 5
max_proposal_jobs: 5
max_db_workers: 4

db_config:
db_path: evolution_db.sqlite
num_islands: 2
archive_size: 40
elite_selection_ratio: 0.3
num_archive_inspirations: 4
num_top_k_inspirations: 2
migration_interval: 10
migration_rate: 0.1
island_elitism: true
enforce_island_separation: true
parent_selection_strategy: weighted
parent_selection_lambda: 10

evo_config:
patch_types: [diff, full, cross]
patch_type_probs: [0.6, 0.3, 0.1]
num_generations: 100
max_api_costs: 0.1
max_patch_resamples: 3
max_patch_attempts: 3
max_novelty_attempts: 3
job_type: local
language: python
llm_models: ["gemini-3-flash-preview", "gpt-5-mini", "gpt-5-nano"]
llm_kwargs:
temperatures: [0, 0.5, 1.0]
reasoning_efforts: [min, low]
max_tokens: 32768
meta_rec_interval: 40
meta_llm_models: ["gpt-5-mini"]
meta_llm_kwargs:
temperatures: [0]
max_tokens: 16384
embedding_model: text-embedding-3-small
code_embed_sim_threshold: 0.99
novelty_llm_models: ["gpt-5-nano"]
novelty_llm_kwargs:
temperatures: [0]
init_program_path: initial.py
llm_dynamic_selection: ucb1
llm_dynamic_selection_kwargs:
exploration_coef: 1
results_dir: results/results_task
63 changes: 63 additions & 0 deletions .claude/skills/shinka-inspect/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
---
name: shinka-inspect
description: Load top-performing Shinka programs into agent context using `shinka.utils.load_programs_to_df`, and emit a compact Markdown bundle for iteration planning.
---

# Shinka Inspect Skill
Extract the strongest programs from a Shinka run and package them into a context file that coding agents can load directly.

## When to Use
Use this skill when:
- A run already produced a results directory and SQLite database
- You want to inspect top-performing programs before launching the next batch
- You want a compact context artifact instead of manually browsing the DB

Do not use this skill when:
- You still need to scaffold a task (`shinka-setup`)
- You need to run evolution batches (`shinka-run`)

## What it does
- Uses `shinka.utils.load_programs_to_df` to read program records
- Ranks programs by `combined_score`
- Selects top-`k` correct programs (fallback to top-`k` overall if no correct rows)
- Writes one Markdown bundle with metadata, ranking table, feedback, and code snippets

## Workflow
1. Confirm run artifacts exist
```bash
ls -la <results_dir>
```

2. Generate context bundle
```bash
python skills/shinka-inspect/scripts/inspect_best_programs.py \
--results-dir <results_dir> \
--k 5
```

3. Optional tuning knobs
```bash
python skills/shinka-inspect/scripts/inspect_best_programs.py \
--results-dir <results_dir> \
--k 8 \
--max-code-chars 5000 \
--min-generation 10 \
--out <results_dir>/inspect/top_programs.md
```

4. Load output into agent context
- Default output path: `<results_dir>/shinka_inspect_context.md`
- Use it as the context artifact for next-step mutation planning

## CLI Arguments
- `--results-dir`: Path to run directory (or direct DB file path)
- `--k`: Number of programs to include (default `5`)
- `--out`: Output markdown path (default under results dir)
- `--max-code-chars`: Per-program code truncation cap (default `4000`)
- `--min-generation`: Optional lower bound on generation
- `--include-feedback` / `--no-include-feedback`: Include `text_feedback` blocks

## Notes
- Ranking metric is `combined_score`.
- If no correct rows exist, script falls back to top-score rows and labels fallback in output.
- Script is read-only for run artifacts (writes only the markdown bundle).
Loading