feat: add real estate scorer benchmark task by TomerWeissman · Pull Request #115 · SakanaAI/ShinkaEvolve

TomerWeissman · 2026-04-06T12:27:15Z

Summary

Adds tasks/real_estate_scorer/ — a new benchmark that evolves a Python scoring function for Buenos Aires apartment investment quality
Fitness metric: Spearman rank correlation between evolved scores and ground-truth price_per_m2 rankings on a held-out 10-listing test set
Baseline (linear weighted sum) achieves 0.9030 Spearman correlation; evolution should discover better feature interactions
Includes: dataset generator (data.py, seed=42), ShinkaEvolve evaluator (evaluate.py), evolvable initial program (initial.py), standalone baseline (baseline.py), evolution runner + YAML config for 10-generation runs

Test plan

python -m tasks.real_estate_scorer.data generates deterministic train/test JSON
python -m tasks.real_estate_scorer.baseline prints Spearman 0.9030
python evaluate.py --program_path initial.py produces combined_score: 0.903 and correct: true
ruff check tasks/ passes clean
python run_evo.py --config_path shinka.yaml runs 10-generation evolution (requires LLM API keys)

🤖 Generated with Claude Code

New example task that evolves a scoring function for Buenos Aires apartment investment quality, using Spearman rank correlation as the fitness metric. Includes dataset generator, evaluator, baseline (0.903 Spearman), and ShinkaEvolve config for 10-generation runs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

RobertTLange · 2026-04-07T11:54:31Z

Thanks for putting this together. As it stands, this looks more like a standalone example/benchmark task than something that demonstrates a novel intrinsic of ShinkaEvolve itself.

At the moment, we are not looking to merge example use-cases that do not clearly exemplify new ShinkaEvolve capabilities or behaviors. In practice, that means examples should highlight something framework-specific, such as a new execution mode, evaluator pattern, language/backend capability, or a core optimization/evolution behavior.

Are you planning to extend this PR to showcase any of those kinds of ShinkaEvolve-specific intrinsics? If yes, please call that out explicitly in the PR description and align the example and results around that. If not, I do not think this is a fit for merge in its current form.

Please also refer to the contribution guidelines, especially the note that we should not add random benchmark tasks or examples just to justify a PR, and that representative tasks should highlight the capability being changed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add real estate scorer benchmark task#115

feat: add real estate scorer benchmark task#115
TomerWeissman wants to merge 1 commit intoSakanaAI:mainfrom
TomerWeissman:feat/real-estate-scorer-task

TomerWeissman commented Apr 6, 2026

Uh oh!

RobertTLange commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

TomerWeissman commented Apr 6, 2026

Summary

Test plan

Uh oh!

RobertTLange commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants