An autonomous AI researcher that runs on any computer. No GPU required.
Fork of karpathy/autoresearch. The original needs an H100. This runs on your laptop while you sleep.
val_bpb: 2.287 (baseline) → 2.226 (after autonomous tuning)
- A small local LLM (Qwen 2.5 0.5B via prima.cpp server) suggests hyperparameter changes
train.pyruns a 5-minute training experiment- If the result improves, it's committed automatically
- Repeat — ~12 experiments per hour, ~100 while you sleep
step 00142 (100.0%) | loss: 2.226145 | epoch: 0 | remaining: 0s
---
val_bpb: 2.226000
training_seconds: 300.0
num_params_M: 0.8
# 1. Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
# 2. Clone and install
git clone https://github.com/bopalvelut-prog/autoresearch.git
cd autoresearch && uv sync
# 3. Download data (one-time)
uv run prepare.py
# 4. Run a single experiment
uv run train.py
# 5. Start prima.cpp server (recommended)
prima-cli -m ~/.cache/autoresearch/qwen2.5-0.5b-instruct-q4_k_m.gguf \
--port 8080 -c 2048 --threads 4
# 6. Let the agent run overnight
python agent.pyWorks on Linux, macOS, Windows. Auto-detects CPU / Apple Silicon / NVIDIA GPU.
Note: Ollama is avoided — it's bloated (~2GB) and requires root. prima.cpp is lightweight (~150MB) and builds from source.
Three files:
| File | Purpose |
|---|---|
prepare.py |
Data download, tokenizer, evaluation. Don't touch. |
train.py |
GPT model + optimizer + training loop. The agent edits this. |
program.md |
Instructions for the agent. You edit this. |
agent.py |
Autonomous research loop with prima.cpp + JSON logging. |
All experiments use a fixed 5-minute time budget. The metric is val_bpb (validation bits per byte) — lower is better.
Every experiment is logged to:
results.tsv— flat TSV for quick viewingresults/run_*.json— structured JSON per runresults/experiments.csv— aggregate CSV for analysis
View your leaderboard:
uv run leaderboard.py --format md --top 10
uv run leaderboard.py --format json --exportThe defaults are conservative (DEPTH=2, 0.8M params). For faster machines:
# In train.py:
DEPTH = 4 # More layers = better quality, slower
TOTAL_BATCH_SIZE = 2**15 # 32768 tokens
DEVICE_BATCH_SIZE = 8
WINDOW_PATTERN = "L" # Full attention (faster on beefy CPUs)For weaker hardware (phones, old laptops):
DEPTH = 1
TOTAL_BATCH_SIZE = 2**12 # 4096 tokens
MAX_SEQ_LEN = 128 # In prepare.py| Fork | Platform | Notes |
|---|---|---|
| miolini/autoresearch-macos | macOS | MPS optimized |
| jsegov/autoresearch-win-rtx | Windows | NVIDIA RTX |
MIT. Built on karpathy/autoresearch.