Claude Coworker Model

Offload bulk I/O from Claude Code to cheap LLMs. Save thousands of tokens on file reading, boilerplate generation, and doc updates. Worker calls cost ~$0.02; primary model focuses on architecture.

Quick Start

git clone https://github.com/imkunal007219/claude-coworker-model.git
cd claude-coworker-model
./setup.sh

export WORKER_API_KEY="your-key"
export WORKER_BASE_URL="https://api.moonshot.ai/v1"
export WORKER_MODEL="kimi-k2.5"

ask-kimi --paths src/*.py --question "Find all SQL injection risks"

How It Works

The expensive model (Claude) handles reasoning and architecture. The cheap worker model handles token-heavy I/O:

Read: Worker ingests large codebases, returns structured summaries with file paths and line numbers
Generate: Worker produces boilerplate using existing files as style references
Extract: Worker parses session transcripts for documentation

Pattern: Claude decides what to do; the worker does the reading/writing.

Configuration

Three environment variables configure any OpenAI-compatible provider:

Variable	Purpose	Example
`WORKER_API_KEY`	API authentication	`sk-abc123`
`WORKER_BASE_URL`	Provider endpoint	`https://api.moonshot.ai/v1`
`WORKER_MODEL`	Model identifier	`kimi-k2.5`

Provider Examples

Kimi (Moonshot AI)

export WORKER_API_KEY="$MOONSHOT_API_KEY"
export WORKER_BASE_URL="https://api.moonshot.ai/v1"
export WORKER_MODEL="kimi-k2.5"

DeepSeek

export WORKER_API_KEY="$DEEPSEEK_API_KEY"
export WORKER_BASE_URL="https://api.deepseek.com/v1"
export WORKER_MODEL="deepseek-chat"

Ollama (local)

export WORKER_API_KEY="ollama"
export WORKER_BASE_URL="http://localhost:11434/v1"
export WORKER_MODEL="qwen2.5-coder:14b"

Tools

ask-kimi

Delegate bulk reading to the worker model. Returns structured bullets, not prose.

# Analyze multiple files for security issues
ask-kimi \
  --paths auth.py database.py utils.py \
  --question "Identify all unvalidated inputs" \
  --max-tokens 8192

# Generate API documentation from source
ask-kimi \
  --paths src/**/*.ts \
  --question "List all exported functions with their arguments"

Flags:

--paths: Files to ingest (supports globs)
--question: Specific extraction query
--max-tokens: Total budget including reasoning tokens
--model: Override WORKER_MODEL

kimi-write

Generate code or documentation using an existing file as a style reference.

# Generate tests matching existing style
kimi-write \
  --spec "Write pytest tests for auth.py covering OAuth2 flow" \
  --context tests/test_main.py \
  --target tests/test_auth.py

# Create API docs matching current format
kimi-write \
  --spec "Document the new /v2/users endpoint" \
  --context docs/endpoints.md \
  --target docs/endpoints_v2.md

Flags:

--spec: What to write (generation instructions)
--context: Reference file to mimic (style, imports, structure)
--target: Output file path
--max-tokens: Token budget for reasoning + output (default 16384)

extract-chat

Convert Claude Code JSONL session logs to human-readable text.

# Extract last session to stdout
extract-chat ~/.claude/projects/my-project/session.jsonl

# Write to file
extract-chat ~/.claude/projects/my-project/session.jsonl -o /tmp/chat.txt

# Pipe to ask-kimi for doc updates
extract-chat session.jsonl -o /tmp/chat.txt && \
  ask-kimi --paths /tmp/chat.txt docs/README.md --question "What doc updates are needed?"

CLAUDE.md Setup

Copy CLAUDE.md.template to your project root as CLAUDE.md. This provides routing rules that tell Claude when to delegate:

## Worker Delegation Rules

When asked to analyze, summarize, or search across multiple files:
DELEGATE to ask-kimi with relevant file paths.

When asked to generate boilerplate, tests, or documentation:
DELEGATE to kimi-write with appropriate reference files.

When asked to review session history:
DELEGATE to extract-chat.

DO NOT delegate:
- Architecture decisions
- Debugging complex logic
- Refactoring plans

Add CLAUDE.md to your repository so Claude Code loads it automatically on startup.

Results

Metric	Before	After
Claude Pro weekly limit	Hit by Wednesday	Never hit
Token usage per session	80%+ on file reading	20% (summaries only)
3-week worker API cost	—	$0.38 total
Context window usage	80% reading files	20% reading summaries

Based on the pattern described in this implementation (medium link) Reddit link (567K views Reddit, 7.2K Medium).

Author

Kunal Bhardwaj — Systems engineer working on autonomous drones and AI-powered developer tools. Building at the intersection of embedded systems and LLM workflows.

Blog: medium.com/@kunalbhardwaj
LinkedIn: linkedin.com/in/kunalbhardwaj

Contributing

PRs welcome. Focus areas: additional provider templates, token usage optimization, and extracting structured data from more session formats.

MIT License. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
examples		examples
tools		tools
.gitignore		.gitignore
CLAUDE.md.template		CLAUDE.md.template
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Claude Coworker Model

Quick Start

How It Works

Configuration

Provider Examples

Tools

ask-kimi

kimi-write

extract-chat

CLAUDE.md Setup

Results

Author

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Claude Coworker Model

Quick Start

How It Works

Configuration

Provider Examples

Tools

ask-kimi

kimi-write

extract-chat

CLAUDE.md Setup

Results

Author

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages