feat: add real estate scorer benchmark task#115
feat: add real estate scorer benchmark task#115TomerWeissman wants to merge 1 commit intoSakanaAI:mainfrom
Conversation
New example task that evolves a scoring function for Buenos Aires apartment investment quality, using Spearman rank correlation as the fitness metric. Includes dataset generator, evaluator, baseline (0.903 Spearman), and ShinkaEvolve config for 10-generation runs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Thanks for putting this together. As it stands, this looks more like a standalone example/benchmark task than something that demonstrates a novel intrinsic of ShinkaEvolve itself. At the moment, we are not looking to merge example use-cases that do not clearly exemplify new ShinkaEvolve capabilities or behaviors. In practice, that means examples should highlight something framework-specific, such as a new execution mode, evaluator pattern, language/backend capability, or a core optimization/evolution behavior. Are you planning to extend this PR to showcase any of those kinds of ShinkaEvolve-specific intrinsics? If yes, please call that out explicitly in the PR description and align the example and results around that. If not, I do not think this is a fit for merge in its current form. Please also refer to the contribution guidelines, especially the note that we should not add random benchmark tasks or examples just to justify a PR, and that representative tasks should highlight the capability being changed. |
Summary
tasks/real_estate_scorer/— a new benchmark that evolves a Python scoring function for Buenos Aires apartment investment qualitydata.py, seed=42), ShinkaEvolve evaluator (evaluate.py), evolvable initial program (initial.py), standalone baseline (baseline.py), evolution runner + YAML config for 10-generation runsTest plan
python -m tasks.real_estate_scorer.datagenerates deterministic train/test JSONpython -m tasks.real_estate_scorer.baselineprints Spearman 0.9030python evaluate.py --program_path initial.pyproducescombined_score: 0.903andcorrect: trueruff check tasks/passes cleanpython run_evo.py --config_path shinka.yamlruns 10-generation evolution (requires LLM API keys)🤖 Generated with Claude Code