Skip to content

Add lean_math task: evolutionary Lean 4 theorem proving via ShinkaEvolve#120

Open
Tyronita wants to merge 1 commit intoSakanaAI:mainfrom
Tyronita:lean-math-bench
Open

Add lean_math task: evolutionary Lean 4 theorem proving via ShinkaEvolve#120
Tyronita wants to merge 1 commit intoSakanaAI:mainfrom
Tyronita:lean-math-bench

Conversation

@Tyronita
Copy link
Copy Markdown

@Tyronita Tyronita commented Apr 9, 2026

  • New task in examples/lean_math/ with 15 theorems across arithmetic/logic tiers
  • initial.py: seed program (2/15 baseline) with EVOLVE-BLOCK over get_proof_tactics()
  • evaluate.py: harness that shells out to lean --stdin and scores compiled proofs
  • run_evo.py: runner supporting small (30-gen) and continuous (1000-gen) configs
  • Install Shinka skills (shinka-setup/run/inspect/convert) for Claude Code

Summary

  • What changed?

Why

  • Why is this change needed?

Linked issue or context

  • Issue number, discussion, or background.

Testing

  • Commands run:
    • uv run ruff check tests --exclude tests/file.py
    • uv run mypy --follow-imports=skip --ignore-missing-imports tests/test_*.py tests/conftest.py
    • uv run --with pytest-cov pytest -q -m "not requires_secrets" --cov=shinka --cov-report=term-missing --cov-report=xml:coverage.xml
  • Optional secret-backed tests:
    • uv run pytest -q -m "requires_secrets"
  • Results:

Risks and compatibility

  • Breaking changes, migrations, limitations, or follow-up work.

Core evolution pipeline evidence

If this PR changes the core program evolution pipeline, provide:

  • Example or benchmark used
  • Baseline used for comparison
  • Exact command/config used
  • Metric comparison
  • Short interpretation

Docs and UI

  • Docs updated if needed
  • Screenshots attached if UI/WebUI behavior changed

- New task in examples/lean_math/ with 15 theorems across arithmetic/logic tiers
- initial.py: seed program (2/15 baseline) with EVOLVE-BLOCK over get_proof_tactics()
- evaluate.py: harness that shells out to lean --stdin and scores compiled proofs
- run_evo.py: runner supporting small (30-gen) and continuous (1000-gen) configs
- Install Shinka skills (shinka-setup/run/inspect/convert) for Claude Code

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant