Cut your OpenClaw / ZeroClaw token bill. Find which model actually earns its cost. Keep the changes that make your agent stronger.
Local · No upload · Works with your existing logs
Live Demo · Quick Start · Preview · FAQ · What it does · Roadmap · 中文 · 日本語 · 한국어 · Español · Français · Deutsch
If you run OpenClaw or ZeroClaw, you already know the problem: token costs compound silently, retries burn budget without warning, and after every model swap or prompt change you're left guessing whether it actually helped.
ClawClip gives you the answer. It reads your local session logs, replays each run step by step, scores the result, and breaks down exactly where the money went — so you can cut waste, pick the right model for each task, and only keep the changes that genuinely make your agent stronger.
ClawClip scans your sessions for retry loops, bloated prompts, verbose outputs, and expensive models doing lightweight work. It surfaces the patterns that quietly inflate your bill — and tells you which ones to fix first.
Run the same task with different models or prompts, then compare them directly: which one scored higher, which one cost less, which one actually improved. No more guessing after a model swap.
Every benchmark run is saved. After a change, you get a before/after comparison — score, token count, cost — with a plain verdict: better, worse, or no real difference. If the score went up but the bill went up more, you'll see that too.
Step through what your agent actually did: every tool call, retry, reasoning block, and response, in order. Find the exact step where it went sideways without digging through raw JSONL.
| What you want to know | What ClawClip shows you |
|---|---|
| Where is my token budget going? | Cost Report breaks spend by model, task type, and session — with waste signals and savings suggestions |
| Is my agent actually getting better? | Agent Scorecard gives a six-dimension verdict after each run, with before/after proof when you make a change |
| What exactly happened in that run? | Run Insights replays every step so you can find the problem without reading raw logs |
| Feature | What it's for |
|---|---|
| Token waste detection | Flags retry loops, context bloat, prompt inefficiency, and model mismatches |
| Model value matrix | Shows which models deliver the best results per dollar across your actual tasks |
| Before/after proof | Compares the last two benchmark runs with a plain verdict |
| Savings suggestions | Prioritizes the changes most likely to cut cost without hurting quality |
| Prompt Efficiency | Checks whether longer prompts and more tokens are actually buying better output |
| Version Compare | Side-by-side comparison of different runs, models, or configs |
| Template Library + Knowledge Base | Reuse what worked, search your history, stop repeating the same experiments |
- Session discovery, parsing, and all analysis run locally.
- ClawClip does not upload your agent run data.
- Pricing refresh is optional — it only updates cost reference numbers, never sends session content.
git clone https://github.com/Ylsssq926/clawclip.git
cd clawclip && npm install
npm startOpen http://localhost:8080. The built-in demo sessions load immediately — no setup needed. When you're ready, point ClawClip at your own OpenClaw or ZeroClaw logs.
Primary support: OpenClaw and ZeroClaw official session formats.
Also works with: any local JSONL-based agent workflow — coverage expands as the parser grows.
# Point at a custom log directory
CLAWCLIP_LOBSTER_DIRS=/path/to/your/sessions npm start| Source | Notes |
|---|---|
~/.openclaw/ |
Auto-discovered at startup |
OPENCLAW_STATE_DIR |
Override the default OpenClaw state path |
CLAWCLIP_LOBSTER_DIRS |
Add extra folders (comma or semicolon separated) |
| Built-in demo sessions | Available immediately, no real data needed |
| ZeroClaw exports / other JSONL | Supported progressively |
The Agent Scorecard is a heuristic diagnostic — not a standardized benchmark. It reads signals from your actual sessions (response quality, tool use, safety patterns, cost structure) to help you compare iterations faster. Use it for relative improvement tracking, not absolute rankings.
- Shareable reports: export a full Replay + Scorecard + Cost summary as a static snapshot
- AI-assisted diagnosis: LLM-powered second opinion on top of the existing heuristics
- Broader agent framework support: more local JSONL runtimes beyond OpenClaw / ZeroClaw
- Real-time monitoring: live session ingestion as runs happen
- QQ Group:
892555092 - GitHub Discussions: Ask a question or share what you built
- Issues: Report a bug or request a feature
Built with 🍤 by Luelan (掠蓝)
If ClawClip helped you cut your token bill or find a better model config, a ⭐ goes a long way.
