A human judgment registry for AI-assisted development.
AI writes code now. It writes a lot of it, and it writes it fast. The old signals that distinguished good engineers from average ones are dissolving. Clean commit history? Claude can do that. Sensible architecture? Claude has opinions. Tests, docs, types? Auto-generated.
What AI can't do is explain why you told it no.
The most valuable thing a developer does in an AI-assisted workflow isn't writing code. It's overriding suggestions, rejecting approaches, choosing constraints, and making tradeoffs that require context the model doesn't have. That judgment is invisible unless you capture it.
/why captures it.
/why is a Claude Code skill that creates structured records of human technical decisions. It has two modes:
Quick mode — triggers automatically when you override Claude mid-conversation ("scratch that", "I'd rather", "instead let's..."). Asks one question: What did you decide and why? Low friction. Takes 10 seconds.
Full mode — triggered explicitly with /why. Walks you through five questions:
- What problem were you solving?
- What did Claude suggest that you rejected, and why?
- What did you decide and what was your reasoning?
- What parts of the output did you write or override yourself?
- What would break this, and do you understand why?
Both modes save structured markdown files to a decisions/ directory with full metadata: date, time, branch, files, tags, role, rubric used, mode, and health score. Claude fills in the metadata. You provide the reasoning. Always.
Every entry gets an automatic Reasoning Health Check — a score from 1-10 and short flag tags that surface potential blind spots, unstated assumptions, or logical gaps in your reasoning. Scoring adapts to who's making the decision (role) and what rubric your team uses. Run /why expand to get the full breakdown appended to the entry.
git clone https://github.com/marcusrein/why.git /tmp/why
cp -r /tmp/why/skill .claude/skills/why
rm -rf /tmp/whyThat's it. Claude Code picks up SKILL.md automatically from .claude/skills/why/.
To make every Claude instance in your project decision-aware, add this to your project's CLAUDE.md:
## /why — Team Decision Context
When working in this project, be aware of the `decisions/` directory at `.claude/skills/why/decisions/`. It contains structured records of human technical decisions made during AI-assisted development.
**Before suggesting an approach:** Check if a prior decision in `decisions/` already addressed the same system, files, or tradeoff. If so, reference it and build on it rather than re-litigating.
**When a decision contradicts a prior one:** Surface it. "Note: this reverses the approach from decisions/2026-03-10-no-orm.md — is that intentional?"
**Weekly digests:** Files matching `decisions/DIGEST-*.md` summarize team patterns. Read the most recent digest to understand recurring flags and team focus areas.Type /why in Claude Code to record a decision in full mode.
The skill activates in quick mode when you use override language:
- "actually let's do..."
- "I don't want to use..."
- "instead let's..."
- "I'd rather..."
- "let's go with..."
- "override"
- "scratch that"
- "no, do it this way"
- "forget that approach"
- "I'm going to go with..."
Claude asks one question — What did you decide and why? — then saves the entry. Quick and out of the way.
Role calibrates how the health check evaluates your reasoning. Set it explicitly:
- Say "I'm a junior engineer" or "CTO here" in conversation
- Run
/why role [role]to set it for the session
Valid roles: cto, staff, senior, mid, junior. If you don't set a role, Claude infers it from the scope of the decision. Explicit always wins over inferred.
Each decision becomes a timestamped markdown file:
decisions/
2026-03-10-custom-session-handler.md
2026-03-12-sqlite-over-postgres.md
2026-03-14-manual-csv-parser.md
2026-03-18-no-orm.md
Entry frontmatter includes:
date: 2026-03-15
time: 14:32
branch: feature/user-auth
files: [src/auth/session.ts, src/auth/middleware.ts]
tags: [dependency, architecture, security]
role: senior
rubric: default
mode: full
health_score: 8Every entry automatically gets a health check below the developer's answers:
Score: 8/10
Flags: session-fixation(med), team-coupling(low)
Related: [2026-03-12-auth-middleware.md]
Run /why expand for details.
The score is a weighted average across the active rubric's dimensions. Flags are short slugs with severity (low, med, high) that surface specific concerns. The Related: line appears when prior decisions in decisions/ touch the same files, tags, or system area — linking the team's reasoning together over time. If a decision contradicts a prior one without acknowledgment, it gets flagged with contradicts-prior(med). AI analysis never mixes into the developer's answers — it lives below a --- separator.
Run /why expand and the detailed analysis gets appended to the most recent decision file:
- Flag breakdown — one sentence per flag explaining what was detected
- Assumptions surfaced — implicit assumptions stated explicitly
- Blind spots — failure modes the developer didn't address
- Confidence calibration — where certainty matched or exceeded evidence
Rubrics define how decisions are scored. Each rubric has weighted dimensions and role-specific calibration guidance. Ships with three:
| Rubric | Optimizes for | Dimensions | Key weight |
|---|---|---|---|
default |
General-purpose reasoning | 4: evidence, assumptions, blind spots, confidence | Equal (25% each) |
security-focused |
Security posture | 5: evidence, assumptions, blind spots, threat model awareness, confidence | Blind spots at 40% |
startup-velocity |
Speed and pragmatism | 5: evidence, assumptions, blind spots, reversibility assessment, confidence | Evidence at 40% |
Set your team's rubric in SKILL.md frontmatter:
rubric: security-focusedThe skill resolves rubrics relative to the SKILL.md file location (rubrics/[name].md in the same directory). Falls back to default if the specified rubric doesn't exist.
Write your own rubric for your team's values. See docs/custom-rubrics.md.
Every rubric includes a Role calibration section that adjusts which dimensions matter most based on who's making the decision. This varies by rubric — here's how the three shipped rubrics differ:
Default rubric:
- CTO/Executive — Evidence specificity weighs heavier. These decisions are expensive to reverse.
- Staff/Senior — All dimensions weighted equally. Expected to articulate tradeoffs explicitly.
- Mid-level — Assumption awareness is the key signal. Are they aware of what they don't know?
- Junior — Confidence calibration matters most. Appropriate uncertainty scores higher than false confidence.
Security-focused rubric:
- CTO/Executive — Threat model awareness is paramount. Evidence should reference compliance, not just intuition.
- Staff/Senior — Blind spot severity is the primary signal. Should catch auth bypass and data exposure without prompting.
- Mid-level — Rewarded for flagging security concerns to seniors rather than solving alone.
- Junior — "I'm not sure if this is secure" scores higher than "this is fine."
Startup-velocity rubric:
- Founder/CTO — Evidence grounded in actual context (runway, user count, team size), not borrowed best practices. Reversibility assessment matters — founders make the hardest-to-undo choices.
- Staff/Senior — Pragmatism rewarded. "This is tech debt and I'm taking it on purpose" is a high-scoring answer.
- Mid-level — Are they building something that requires conditions the startup hasn't validated?
- Junior — Main signal: are they shipping or blocked trying to make it perfect?
Role calibration adjusts which dimensions matter most, not the overall scoring bar.
When multiple team members install /why in the same repo, their decisions sync through git. Every Claude instance becomes aware of the team's collective reasoning.
-
Add the CLAUDE.md snippet above. Every Claude instance in the project reads this on startup. It tells Claude to check
decisions/for prior reasoning before suggesting approaches. -
Decisions cross-reference automatically. When a new decision relates to files or tags from a prior entry, the health check includes a
Related:line linking to those entries. If a decision contradicts a prior one without acknowledgment, it gets flagged withcontradicts-prior(med). -
Weekly digests aggregate patterns. Run
why-digest.shto generate a team summary that Claude instances can read for context.
When Sarah makes a decision about the auth system on Monday and commits it, David's Claude instance can reference it on Wednesday:
"Note: Sarah decided to use in-memory sessions over Redis in [decisions/2026-03-15-custom-session-handler.md]. Your proposed change to add Redis would reverse that decision — is that intentional?"
This happens because CLAUDE.md tells every Claude instance to check decisions/ before suggesting approaches. No MCP server, no shared backend — just git and markdown.
Generate a team summary from your clone of the why repo:
bash ~/tools/why/scripts/why-digest.sh .claude/skills/why/decisionsThis creates decisions/DIGEST-2026-W12.md with:
- Decision count, mode breakdown, average score
- Table of all decisions with scores and roles
- Recurring flags across the team
- Participation by role with average scores
- A "Patterns & observations" section for the team to fill in during retros
Commit the digest so Claude instances can read it for team context.
Arguments: why-digest.sh [decisions-dir] [weeks-back]. Defaults to decisions/ and 1 week.
Keep a clone of the why repo around for the analysis scripts:
git clone https://github.com/marcusrein/why.git ~/tools/whyThen run stats against your project's decisions:
bash ~/tools/why/scripts/why-stats.sh .claude/skills/why/decisions 30/why stats — last 30 days
Decisions logged: 14
Avg health score: 6.8
Score distribution: ▓▓▓▓▓▓▓▓██░░ (1-3: 1, 4-7: 8, 8-10: 5)
Top flags: unscoped-work(6), no-evidence(3), recency-bias(2)
Decisions this week: 3
By role: senior(8) mid(4) junior(2)
Score by role:
junior: 2 decisions, avg score 5.5
mid: 4 decisions, avg score 6.2
senior: 8 decisions, avg score 7.4
The role breakdown and score-by-role sections appear when entries have role data. No dependencies — just bash + awk. Runs on macOS and Linux.
Arguments: why-stats.sh [decisions-dir] [days]. Defaults to decisions/ and 30 days.
See docs/team-guide.md for:
- How different roles (CTO to junior) use
/why— when to use quick vs full mode, what the health check catches for each role - How to review each other's decision entries (like code review but for reasoning)
- How to use stats in retros and standups
- How to choose and customize a rubric for your team
- What a healthy
decisions/folder looks like at 1 month, 3 months, 6 months - Anti-patterns to avoid
An engineer's decisions/ folder is a body of work that can't be faked. It shows:
- Technical taste — what you choose not to use matters as much as what you build
- Context awareness — decisions that account for team size, infra constraints, timeline
- Ownership of tradeoffs — you know what could break and why you accepted the risk
- Signal in a noisy world — when everyone's code looks the same, the reasoning behind it is what separates you
This is your audit trail. For yourself, for your team, for anyone who inherits your code and needs to know why it's shaped the way it is.
Scores are not performance metrics. They measure reasoning quality on a single decision, not developer quality. Never use them in performance reviews. See the team guide anti-patterns section.
See skill/examples/example-decision.md for a complete entry showing a senior engineer who rejected a Redis-backed session store in favor of 40 lines of custom code, with reasoning and health check.
- Claude fills metadata, never answers. Questions are human-only.
- Quick is the default. Auto-triggers ask one question. Full mode is opt-in via
/why. - Health check is automatic. Score and flags on every entry. No extra prompts.
- Expand is opt-in. Full breakdown only when you run
/why expand. - Your words, verbatim. Answers are stored exactly as written, above the line. AI analysis lives below.
- Rubrics are swappable. Scoring adapts to what your team values.
- Role-aware scoring. Declare your role or let it be inferred. The same rubric evaluates different roles at appropriate scope.
- Git is the database. Everything is markdown files in your repo. No backend, no accounts, no infra.
- Team-aware by default. Every Claude instance reads CLAUDE.md and checks prior decisions before suggesting approaches.
why/
README.md # This file
LICENSE # MIT
skill/ # Copy this → .claude/skills/why/
SKILL.md # Claude Code skill definition
decisions/ # Where entries land
examples/
example-decision.md # Full mode entry with health check
rubrics/
default.md # 4 dimensions, equal weights
security-focused.md # 5 dimensions, blind spots at 40%
startup-velocity.md # 5 dimensions, evidence at 40%
scripts/
why-stats.sh # Decision analytics (bash, no deps)
why-digest.sh # Weekly team digest generator
docs/
custom-rubrics.md # How to write your own rubric
team-guide.md # How to roll /why out to a team
MIT