Offload bulk I/O from Claude Code to cheap LLMs. Save thousands of tokens on file reading, boilerplate generation, and doc updates. Worker calls cost ~$0.02; primary model focuses on architecture.
git clone https://github.com/imkunal007219/claude-coworker-model.git
cd claude-coworker-model
./setup.sh
export WORKER_API_KEY="your-key"
export WORKER_BASE_URL="https://api.moonshot.ai/v1"
export WORKER_MODEL="kimi-k2.5"
ask-kimi --paths src/*.py --question "Find all SQL injection risks"The expensive model (Claude) handles reasoning and architecture. The cheap worker model handles token-heavy I/O:
- Read: Worker ingests large codebases, returns structured summaries with file paths and line numbers
- Generate: Worker produces boilerplate using existing files as style references
- Extract: Worker parses session transcripts for documentation
Pattern: Claude decides what to do; the worker does the reading/writing.
Three environment variables configure any OpenAI-compatible provider:
| Variable | Purpose | Example |
|---|---|---|
WORKER_API_KEY |
API authentication | sk-abc123 |
WORKER_BASE_URL |
Provider endpoint | https://api.moonshot.ai/v1 |
WORKER_MODEL |
Model identifier | kimi-k2.5 |
Kimi (Moonshot AI)
export WORKER_API_KEY="$MOONSHOT_API_KEY"
export WORKER_BASE_URL="https://api.moonshot.ai/v1"
export WORKER_MODEL="kimi-k2.5"DeepSeek
export WORKER_API_KEY="$DEEPSEEK_API_KEY"
export WORKER_BASE_URL="https://api.deepseek.com/v1"
export WORKER_MODEL="deepseek-chat"Ollama (local)
export WORKER_API_KEY="ollama"
export WORKER_BASE_URL="http://localhost:11434/v1"
export WORKER_MODEL="qwen2.5-coder:14b"Delegate bulk reading to the worker model. Returns structured bullets, not prose.
# Analyze multiple files for security issues
ask-kimi \
--paths auth.py database.py utils.py \
--question "Identify all unvalidated inputs" \
--max-tokens 8192
# Generate API documentation from source
ask-kimi \
--paths src/**/*.ts \
--question "List all exported functions with their arguments"Flags:
--paths: Files to ingest (supports globs)--question: Specific extraction query--max-tokens: Total budget including reasoning tokens--model: OverrideWORKER_MODEL
Generate code or documentation using an existing file as a style reference.
# Generate tests matching existing style
kimi-write \
--spec "Write pytest tests for auth.py covering OAuth2 flow" \
--context tests/test_main.py \
--target tests/test_auth.py
# Create API docs matching current format
kimi-write \
--spec "Document the new /v2/users endpoint" \
--context docs/endpoints.md \
--target docs/endpoints_v2.mdFlags:
--spec: What to write (generation instructions)--context: Reference file to mimic (style, imports, structure)--target: Output file path--max-tokens: Token budget for reasoning + output (default 16384)
Convert Claude Code JSONL session logs to human-readable text.
# Extract last session to stdout
extract-chat ~/.claude/projects/my-project/session.jsonl
# Write to file
extract-chat ~/.claude/projects/my-project/session.jsonl -o /tmp/chat.txt
# Pipe to ask-kimi for doc updates
extract-chat session.jsonl -o /tmp/chat.txt && \
ask-kimi --paths /tmp/chat.txt docs/README.md --question "What doc updates are needed?"Copy CLAUDE.md.template to your project root as CLAUDE.md. This provides routing rules that tell Claude when to delegate:
## Worker Delegation Rules
When asked to analyze, summarize, or search across multiple files:
DELEGATE to ask-kimi with relevant file paths.
When asked to generate boilerplate, tests, or documentation:
DELEGATE to kimi-write with appropriate reference files.
When asked to review session history:
DELEGATE to extract-chat.
DO NOT delegate:
- Architecture decisions
- Debugging complex logic
- Refactoring plansAdd CLAUDE.md to your repository so Claude Code loads it automatically on startup.
| Metric | Before | After |
|---|---|---|
| Claude Pro weekly limit | Hit by Wednesday | Never hit |
| Token usage per session | 80%+ on file reading | 20% (summaries only) |
| 3-week worker API cost | — | $0.38 total |
| Context window usage | 80% reading files | 20% reading summaries |
Based on the pattern described in this implementation (medium link) Reddit link (567K views Reddit, 7.2K Medium).
Kunal Bhardwaj — Systems engineer working on autonomous drones and AI-powered developer tools. Building at the intersection of embedded systems and LLM workflows.
- Blog: medium.com/@kunalbhardwaj
- LinkedIn: linkedin.com/in/kunalbhardwaj
PRs welcome. Focus areas: additional provider templates, token usage optimization, and extracting structured data from more session formats.
MIT License. See LICENSE.