Skip to content

frmoretto/memory-trail

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Memory Trail v1.1

Open-source decision memory and session logging for AI-assisted development.

License: CC BY 4.0

"Checkpoints track WHAT changed. Memory Trail tracks WHY."


The Problem

AI coding assistants make decisions but don't remember why. Each session starts fresh.

Session 1: "Should I cache images locally?" → Agent caches images
Session 2: "Should I cache images locally?" → Agent caches images again
Session 3: User discovers TOS violation, deletes cache
Session 4: "Should I cache images locally?" → Agent caches images (forgot again)

The agent isn't malfunctioning. It simply has no memory of past decisions.

Git tracks WHAT changed. Nobody tracks WHY.

Tool Tracks
Git Code diffs — WHAT changed
Changelog Release notes — WHAT shipped
Memory Trail Decision rationale — WHY it changed

What Is Memory Trail?

Memory Trail is an open-source persistence layer for AI-assisted development.

  • Memory — Decisions persist across sessions
  • Trail — Follow the breadcrumbs back to understand WHY

Core Components

Component Purpose Format
Decision Memory Architectural constraints agents must follow docs/DECISION_MEMORY.md
Confidence Protocol Structured uncertainty signaling with risk adjustments Agent behavior
STOP Triggers Hard stops for dangerous operations Agent behavior
Session Logs Per-task action traces docs/sessions/SES-YYYY-MM-DD-NNN.md

With Memory Trail

Session 1: Agent checks Decision Memory → [DEC-001] Use PostgreSQL
Session 2: Agent checks Decision Memory → [DEC-001] Use PostgreSQL
Session 3: Agent checks Decision Memory → [DEC-001] Use PostgreSQL
(Constraint persists across all sessions)

Quick Start

Option 1: Claude Skills (claude.ai / Claude Desktop)

  1. Download memory-trail.zip
  2. Go to Settings → Features → Skills → Add
  3. Upload the zip file
  4. Claude will now track decisions and sessions

Option 2: Claude Code / Roo Code

  1. Copy SKILL.md to your project's skills folder
  2. Copy templates from assets/ to your project:
    cp assets/DECISION_MEMORY_TEMPLATE.md docs/DECISION_MEMORY.md
    cp assets/AGENT_RULES_TEMPLATE.md .roo/rules-code/rules.md
    mkdir -p docs/sessions
  3. Add your first decisions to docs/DECISION_MEMORY.md

Option 3: Claude Projects

Add SKILL.md to project knowledge + copy templates to project files.

Option 4: Manual / Other LLMs

Use the Decision Memory format and Session Log format manually.

For Cursor, Windsurf, or other AI tools, adapt AGENT_RULES_TEMPLATE.md to your tool's configuration format.


The 4 Components

1. Decision Memory

Architectural decisions as constraints. Agents read before implementing.

### [DEC-001] PostgreSQL Over MongoDB
**Category:** TECHNOLOGY  
**Date:** 2025-01-15  
**Status:** ACTIVE

**Context:** Need to choose primary database. Team has SQL experience.

**Decision:** Use PostgreSQL for all persistent data.

**Consequences:**
- All schemas use relational design patterns
- JSONB for semi-structured data where needed
- No document-style collections

Agent protocol:

  • Read before significant changes
  • Cite as [per DEC-XXX] when following
  • STOP if action conflicts
  • Propose new decisions for architectural choices

Decision lifecycle:

Proposed → Active → [Superseded | Deprecated]
Transition Trigger Action Required
Proposed → Active Human approval Update status, set date
Active → Superseded New decision replaces Link to replacement [DEC-XXX], keep old for context
Active → Deprecated No longer relevant Add deprecation reason

Agents must follow ACTIVE decisions, follow the replacement for SUPERSEDED, and ignore DEPRECATED.

2. Confidence Protocol

Signal uncertainty at START of every response:

Level Threshold Behavior
🟢 CERTAIN 95%+ Proceed, log action
🔵 CONFIDENT 80-94% Show intent, proceed
🟡 PROBABLE 60-79% Explain rationale, request approval
🟠 UNCERTAIN 40-59% Present options, human chooses
🔴 UNCLEAR <40% Ask first, don't proceed

Risk adjustments modify the base confidence before choosing behavior:

Factor Adjustment When
DESTRUCTIVE -15% DELETE, DROP, bulk removal
IRREVERSIBLE -25% Schema migrations, renaming
SECURITY -20% Auth, permissions, secrets
EXTERNAL -10% API calls, third-party services
TESTED +15% Has test coverage
REVERSIBLE +10% Git-tracked, additive change
ISOLATED +10% Feature-flagged, self-contained

Example calculations:

Task: Refactoring a function with tests
Base confidence: 85%  + TESTED: +15%  + REVERSIBLE: +10%
= Effective: 100% → 🟢 CERTAIN → Proceed

Task: Changing auth logic
Base confidence: 90%  - SECURITY: -20%  - IRREVERSIBLE: -25%
= Effective: 45% → 🟠 UNCERTAIN → Present options, human chooses

Task: Deleting unused migration files
Base confidence: 75%  - DESTRUCTIVE: -15%  - IRREVERSIBLE: -25%
= Effective: 35% → 🔴 UNCLEAR → STOP, ask first

The protocol forces the agent to downshift when context is risky, even when the action itself seems straightforward.

3. STOP Triggers

Hard stops requiring human decision:

Category Examples
Security API keys, auth logic, encryption
Destructive DELETE, DROP, bulk removal
Irreversible Schema migrations, renaming
Financial Payment code, pricing logic

When triggered: 🔴 UNCLEAR → Explain risk → Present 2-3 options → Wait.

Example:

[Confidence: 🔴 UNCLEAR — DESTRUCTIVE + IRREVERSIBLE]

This task involves dropping the users_legacy table.

Risk: Data loss. Migration cannot be undone without backup.

Options:
1. Rename to _archive_users_legacy (reversible)
2. Export to CSV first, then drop
3. Skip — keep table, add deprecation comment

Which approach do you prefer?

4. Session Logs

Per-task action tracing. One file per task.

# Session Log: SES-2025-12-29-001

**Date:** 2025-12-29
**Agent:** Claude Desktop

## Actions

| Action | Confidence | Decisions | Files |
|--------|------------|-----------|-------|
| Added loading spinner | 🟢 | [DEC-002] | dashboard.html |
| Fixed auth redirect | 🔵 | [DEC-003] | auth.py |

## Handoff
Next: implement caching per [DEC-004]

Rules:

  • One file per task (never append)
  • Sequential numbering: 001, 002, 003...
  • Merge to *-recap.md daily
  • For history: read only recap files

Pre-flight Protocol

Before any significant action, agents should run this checklist:

☐ DECISION_MEMORY.md read this session?
  → NO → Read it now
  → YES → Proceed

☐ Relevant [DEC-XXX] constraints?
  → YES → Cite: "Implementing per [DEC-XXX]"
  → CONFLICTS → STOP, flag to human

☐ STOP trigger category?
  → Security / Destructive / Irreversible / Financial
  → Apply risk adjustments → Signal confidence → Wait if 🔴

☐ Confidence signaled at START of response?

File Structure

project/
├── docs/
│   ├── DECISION_MEMORY.md          # Architectural constraints
│   └── sessions/
│       ├── SES-2025-12-29-001.md   # Task 1
│       ├── SES-2025-12-29-002.md   # Task 2
│       └── SES-2025-12-29-recap.md # Daily summary
├── .roo/rules-code/
│   └── rules.md                    # Agent rules (Roo Code)
└── CLAUDE.md                       # Project conventions

Multi-Agent Coordination

When multiple agents work on the same project, they share context via files:

File Who Writes Who Reads
docs/DECISION_MEMORY.md Human + Agents All agents
docs/sessions/SES-*.md Each agent All (recaps only)
CLAUDE.md / rules file Human All agents

Example workflow:

  1. Claude Desktop analyzes: "Should we use MongoDB here?"
  2. Finds [DEC-001] conflict, proposes options
  3. Human decides, Claude Desktop logs session
  4. Roo Code implements, reads same Decision Memory
  5. Constraint maintained across both agents

Known Limitations

Memory Trail v1.1 is a documentation protocol, not a runtime system. Being honest about what it doesn't do:

Limitation Description Implication
No automated enforcement STOP triggers and confidence protocol rely on LLM compliance, not tooling An agent can ignore protocols — nothing prevents it
No conflict detection Decision Memory is read passively; violations aren't caught automatically Agents may contradict decisions without being flagged
No decision extraction Implicit decisions in session logs stay implicit unless a human elevates them Important decisions can be lost in session history
No cross-session learning Each session reads files fresh; no persistent agent memory beyond files Agent doesn't "get better" at following your preferences
LLM reliability Confidence signaling depends on the model accurately self-assessing Models may over- or under-estimate confidence
Single-human scale HITL model assumes one person can review all decisions Doesn't scale to large teams without additional tooling

These limitations are known and inform the roadmap. The protocol is valuable even with these constraints — having decisions documented is strictly better than not — but users should understand what "compliance" means in a system without enforcement.

Lessons learned (2026-01): In production use, we observed an incident where an AI agent proposed cross-repository bulk deletion (rm -rf across 4 directories) despite STOP triggers being documented. The agent signaled no uncertainty and presented no alternatives. Memory Trail protocols existed but were not enforced by tooling. This confirmed that documentation-only protocols have a compliance ceiling — the agent followed protocols most of the time, but failed on the one action where it mattered most. See INCIDENT_2026-01-16_FAILSAFE_PROPOSAL.md for the full analysis and proposed automated enforcement.


Intended Scope

Designed for:

  • Solopreneurs and indie developers
  • Small teams (1-5 people)
  • Startups in early stages
  • Projects where one human can maintain full context

Not designed for:

  • Enterprise teams (50+ engineers)
  • Multi-team coordination
  • Compliance-heavy environments

Why this boundary: Memory Trail optimizes for velocity and low overhead. The HITL model assumes a single human can verify claims and make decisions.


Prior Art

Memory Trail builds on established patterns. See PRIOR_ART.md.

Architecture Decision Records (ADRs): Nygard's original format, MADR
Project Memory: Cursor rules, CLAUDE.md conventions
Session Tracking: Development journals, captain's logs

The opportunity: ADRs exist but aren't optimized for AI consumption. Project memory files exist but lack decision tracking structure. Memory Trail bridges this gap.


Documentation

Document Description
ARCHITECTURE.md Full component specs, formats, protocols
PRIOR_ART.md Landscape of related patterns
ROADMAP.md Future development plans
SKILL.md Claude skill implementation
assets/ Ready-to-use templates
examples/ Real-world usage examples

Related

Stream Coding Stack — Integrated methodology (Memory Trail + Stream Coding + Clarity Gate)
github.com/frmoretto/stream-coding-stack

Stream Coding — Documentation-first development methodology
github.com/frmoretto/stream-coding

Clarity Gate — Pre-ingestion verification for epistemic quality
github.com/frmoretto/clarity-gate

ArXiParse — Production system where Memory Trail was developed
arxiparse.org


License

CC BY 4.0 — Use freely with attribution.


Author

Francesco Marinoni Moretto


Contributing

Looking for:

  1. Integration — Cursor, Windsurf, other AI tool configurations
  2. Templates — Domain-specific decision templates
  3. Examples — Real project decision memories
  4. Feedback — Is the 4-component structure right?
  5. Enforcement — Automated tooling for STOP trigger compliance

Open an issue or PR.

About

Decision memory and session logging for AI-assisted development. Track WHY decisions are made, not just WHAT changed. Works with Claude, Cursor, Roo Code, and other AI coding assistants.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors