Skip to content

Add shared long-term memory server (experimental)#5015

Draft
reyortiz3 wants to merge 24 commits intomainfrom
feature/memory-server-core
Draft

Add shared long-term memory server (experimental)#5015
reyortiz3 wants to merge 24 commits intomainfrom
feature/memory-server-core

Conversation

@reyortiz3
Copy link
Copy Markdown
Contributor

Summary

ToolHive manages MCPs (tools) and Skills (procedural knowledge as OCI artifacts). The missing primitive is shared long-term memory — a knowledge store that agents can query and contribute to across sessions. Without it every agent session starts cold, and facts learned in one session are invisible to others.

This PR introduces the memory server core (Plan 1 of 3):

  • pkg/memory/ — domain types (Entry, Revision, ListFilter), three pluggable interfaces (Store, VectorStore, Embedder), a Service orchestration layer with conflict detection and score-weighted search ranking, trust/staleness scoring formulas, and gomock mocks
  • pkg/memory/sqlite/ — SQLite-backed Store and VectorStore (Go-native cosine similarity, no CGo dependency); goose migrations including a TypeEpisodic type for time-indexed event records
  • pkg/memory/embedder/ollama/ — Ollama HTTP embedder that probes vector dimensions on startup
  • cmd/thv-memory/ — standalone MCP server binary serving 9 tools over streamable HTTP (/mcp), with a /health liveness probe, YAML config with sensible defaults, and a background lifecycle job (TTL expiry, score recomputation every 24h)
  • docs/proposals/2026-04-22-shared-memory-server.md — full design doc covering architecture, tool surface, scoring, conflict detection, Skills relationship, comparison with LinkedIn's Cognitive Memory Agent, and the recommended three-tier memory activation strategy

Key design decisions:

  • Conflict detection is write-time (cosine similarity > 0.85 blocks the write and returns candidates for the agent to resolve — no LLM inference needed)
  • Search results are ranked by similarity × trust_score × (1 − 0.3 × staleness_score) so flagged/stale entries don't rank above fresh, trusted ones
  • The agent IS the retrieval orchestrator — tools are explicit MCP calls, not auto-triggered pipelines
  • Three memory types: semantic (aggregated facts), procedural (how-to), episodic (time-indexed events with CreatedAfter/CreatedBefore list filters)

Plans 2 (CLI thv memory subcommand + system workload integration) and 3 (Kubernetes MCPMemoryServer CRD) are follow-up work.

Type of change

  • New feature

Test plan

  • Unit tests (task test)
  • Linting (task lint-fix)
  • Manual testing (integration test in cmd/thv-memory/integration_test.go wires real SQLite store + vector store + fake embedder end-to-end: remember → search → access count increment → delete → ErrNotFound; conflict detection test verifies force-write path)

Changes

File Change
pkg/memory/types.go Domain types: Entry, Revision, ListFilter (with time-range fields), VectorFilter, Type (semantic/procedural/episodic), scoring types
pkg/memory/interfaces.go Store, VectorStore, Embedder interfaces + mockgen directives
pkg/memory/service.go Service: conflict detection, Remember, Search with composite ranking
pkg/memory/scoring.go ComputeTrustScore, ComputeStalenessScore
pkg/memory/sqlite/ SQLite Store, VectorStore, goose migrations (001 initial + 002 adds episodic type)
pkg/memory/embedder/ollama/ Ollama HTTP embedder
pkg/memory/mocks/ Generated gomock mocks for all three interfaces
cmd/thv-memory/main.go Entry point: HTTP server lifecycle, graceful shutdown
cmd/thv-memory/server.go MCP server construction, tool registration, streamable HTTP handler + /health
cmd/thv-memory/config.go YAML config with defaults (SQLite, Ollama, 0.0.0.0:8080)
cmd/thv-memory/lifecycle/job.go Background job: TTL expiry + score recomputation
cmd/thv-memory/tools/ 9 MCP tool handlers
cmd/thv-memory/integration_test.go End-to-end integration test
docs/proposals/2026-04-22-shared-memory-server.md Design doc

Does this introduce a user-facing change?

No — this adds a new standalone binary (cmd/thv-memory) and supporting packages. Nothing in the existing CLI or operator is modified. The binary is not yet wired into thv commands (that is Plan 2).

Implementation plan

Approved implementation plan

This PR was planned and implemented with Claude Code. The design spec is at docs/proposals/2026-04-22-shared-memory-server.md. The implementation follows the spec with the following notable adaptations:

  • Type names use Go stutter-avoidance convention (Entry not MemoryEntry, Store not MemoryStore) to satisfy the revive linter
  • goose.NewProvider (scoped) used instead of global goose.SetBaseFS/SetDialect to avoid concurrent-open races
  • server.NewStreamableHTTPServer + server.WithStdioContextFunc used to match actual mcp-go v0.48.0 API
  • SQLite VectorStore uses Go-native cosine similarity (no CGo/sqlite-vec dependency) with a load-and-score approach; external VectorStore providers are pluggable via the VectorStore interface for datasets > 100K entries

Special notes for reviewers

This is experimental — do not merge until Plans 2 and 3 are ready. Specific areas to scrutinise:

  • pkg/memory/sqlite/vector.go: the load-all-and-score approach works for small datasets but will not scale past ~100K entries. The VectorStore interface is designed to be swapped for Qdrant/pgvector when needed.
  • pkg/memory/service.go: the conflict threshold (0.85) and staleness penalty weight (0.3) are initial values — they will need tuning against real usage data.
  • cmd/thv-memory/server.go: no auth middleware on the MCP endpoint yet. Auth will be enforced at the ToolHive proxy layer when the system workload integration lands in Plan 2.

Generated with Claude Code

reyortiz3 and others added 5 commits April 21, 2026 18:25
Introduces the core domain types for ToolHive's shared long-term memory
system: MemoryEntry, MemoryRevision, typed constants for MemoryType,
AuthorType, SourceType, EntryStatus, and ArchiveReason, plus filter and
result types used by the store interface.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Introduces pkg/memory with three pluggable interfaces (Store, VectorStore,
Embedder), a Service orchestration layer with conflict detection and
score-weighted search ranking, SQLite-backed implementations, an Ollama
embedder, and gomock mocks for all interfaces.

Key behaviours:
- Conflict detection on write: cosine similarity > 0.85 blocks the write
  and returns conflicting entries for the agent to resolve
- Trust scoring: author weight × age decay × correction penalty × flag multiplier
- Staleness scoring: access age + flag bonus + correction bonus
- Search ranking: composite score (similarity × trust × staleness penalty)
  so flagged/stale entries do not rank above fresh, trusted ones
- TypeEpisodic memory type for time-indexed event records
- ListFilter time-range fields (CreatedAfter/CreatedBefore) for timeline queries
- SQLite migration 002 widens the type CHECK constraint to include episodic

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Standalone MCP server exposing 9 memory tools over streamable HTTP
(/mcp endpoint, /health liveness probe). Wires SQLite store and vector
store, Ollama embedder, and a background lifecycle job that runs every
24h to expire TTL'd entries and recompute trust/staleness scores.

Tools: memory_remember, memory_search, memory_recall, memory_forget,
memory_update, memory_flag, memory_list, memory_consolidate,
memory_crystallize.

Config via memory-server.yaml with defaults (SQLite + sqlite-vec +
Ollama on localhost:11434, listening on 0.0.0.0:8080).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Covers architecture, MCP tool surface, trust/staleness scoring,
conflict detection, Skills relationship, a comparison with LinkedIn's
Cognitive Memory Agent, and the recommended three-tier memory activation
strategy (session-boundary injection, signal-based mid-session reads,
write-on-observation).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large PR Detected

This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.

How to unblock this PR:

Add a section to your PR description with the following format:

## Large PR Justification

[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformation

Alternative:

Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.

See our Contributing Guidelines for more details.


This review will be automatically dismissed once you add the justification section.

@github-actions github-actions Bot added the size/XL Extra large PR: 1000+ lines changed label Apr 22, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 22, 2026

Codecov Report

❌ Patch coverage is 25.90874% with 958 lines in your changes missing coverage. Please review.
✅ Project coverage is 66.51%. Comparing base (cffe934) to head (9bb592f).
⚠️ Report is 68 commits behind head on main.

Files with missing lines Patch % Lines
cmd/thv-memory/resources/api.go 0.00% 168 Missing ⚠️
demo/recruiter/cmd/demo/main.go 0.00% 162 Missing ⚠️
pkg/memory/sqlite/store.go 59.07% 80 Missing and 17 partials ⚠️
cmd/thv-memory/server.go 0.00% 78 Missing ⚠️
cmd/thv-memory/main.go 0.00% 69 Missing ⚠️
cmd/thv-memory/tools/crystallize.go 0.00% 39 Missing ⚠️
cmd/thv-memory/tools/remember.go 0.00% 38 Missing ⚠️
cmd/thv-memory/tools/consolidate.go 0.00% 37 Missing ⚠️
cmd/thv-memory/config.go 0.00% 36 Missing ⚠️
pkg/memory/service.go 63.91% 20 Missing and 15 partials ⚠️
... and 11 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5015      +/-   ##
==========================================
- Coverage   69.02%   66.51%   -2.52%     
==========================================
  Files         554      621      +67     
  Lines       73075    61912   -11163     
==========================================
- Hits        50443    41180    -9263     
+ Misses      19620    17571    -2049     
- Partials     3012     3161     +149     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 22, 2026
…s sync

UI-managed resource entries (source=resource) are stored in the database and
exposed read-only to agents via MCP Resources protocol. The management REST
API (/api/resources) lets the UI create, update, and delete resources; the MCP
server is kept in sync so agents receive list_changed notifications and can
discover new content progressively through memory_search or resources/list.

- Add SourceResource type and ErrReadOnly sentinel; protect tool layer from
  mutating skill/resource entries (forget, update, flag)
- Extend VectorFilter with Source field so search can scope to resources
- Add management REST API package (cmd/thv-memory/resources/api.go) wired via
  closure injection to avoid circular imports with package main
- Refactor server.go: split newMCPServer / newHandler, add RegisterResourceEntry
  / UnregisterResourceEntry helpers, LoadExistingResources at startup
- Switch HTTP handler to streamable-HTTP transport (NewStreamableHTTPServer)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 24, 2026
Demonstrates all memory types in a realistic hiring workflow: a recruiter
and hiring manager share memory across separate MCP sessions, showing how
episodic and semantic memories prevent repeated mistakes (visa/salary
mismatches) and how procedural patterns crystallize into a reusable Skill.

demo/recruiter/
  Makefile                — build, start, demo, teardown in one place
  cmd/demo/main.go        — Go binary: 7-phase recruiter scenario
  config/memory-server.yaml.tmpl — demo server config (port 8765, SQLite)
  data/job-description.txt       — Senior Go Engineer JD ($100K–$150K)
  demo.tape               — VHS recording script

Run: cd demo/recruiter && make all
     cd demo/recruiter && make teardown  (repeat)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 24, 2026
task build only compiles bin/thv, not bin/thv-memory.
Use go build directly to produce the correct server binary.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 24, 2026
…arsing

- Migration 003: widen source CHECK to include 'resource' (was only
  'memory','skill'), fixing the constraint violation on resource creation
- remember tool: expose 'tags' parameter so agents can label memories
  at write time (was silently dropped)
- Demo Makefile: teardown now deletes server binary so rebuild always
  picks up new migrations
- Demo binary: add HTTP status check on resource upload; fix JSON field
  names to match Go struct serialisation (MemoryID not memory_id,
  Entry.Content not content at top level)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 24, 2026
Phases 1-2 (resource upload, semantic memory) are handled by the Go setup
binary. Phases 3-7 are now real Claude Code sessions using --print mode so
the demo shows an actual AI agent consuming the memory MCP.

- Add prompts/ directory with per-session prompt files for each phase:
  recruiter-alice, hiring-manager, recruiter-bob, recruiter-charlie, crystallize
- Add Makefile targets: session-recruiter-alice, session-hiring-manager,
  session-recruiter-bob, session-recruiter-charlie, session-crystallize
- Add mcp-config target that generates .demo.mcp.json for Claude Code
- Update demo target to run setup binary then all five agent sessions in sequence
- Update demo.tape VHS recording to show Claude Code session targets
- Teardown now also removes .demo.mcp.json

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 24, 2026
Previous prompts told Claude which tools to call and in what order.
They now read like real messages a recruiter would type — the agent
discovers visa policy, logs outcomes, and crystallizes runbooks on its
own by deciding when to reach for memory_search, memory_remember, and
memory_crystallize.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 26, 2026
Instead of running claude --print automatically, each session target
now prints the prompt to use and pauses until Enter is pressed. This
lets you run the Claude Code session yourself (with the MCP config
shown) before the demo advances to the next phase.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
15-slide deck covering the architecture (memory types, lifecycle,
crystallization path) followed by a slide-by-slide walkthrough of
the recruiter scenario. Self-contained HTML — open in browser and
present fullscreen, no build step needed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 28, 2026
- Add part dividers to structure narrative (Problem → Architecture → Demo)
- Add "What Research Tells Us" slide with 3 sources
- Add comparison table: LinkedIn CMA vs ToolHive Memory
- Add References slide with full citations for all three sources
- Fix memory types slide — replace 2x2 card grid with compact rows
  that no longer overflow the slide
- Set center:false so slides don't jump vertically between content sizes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 28, 2026
- Increase logical height 680→750 and tighten margin 0.05→0.04
  to give dense slides more breathing room
- Shrink ref-card padding/font and comparison table row padding
  so Research and References slides fit without clipping
- Reduce h2 and phase-list padding to reclaim vertical space
- Add tenant-isolation row to comparison table for completeness

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 29, 2026
Reveal sets height:auto on section elements, so overflow-y:auto had
nothing to overflow against — the slides container clipped instead.
Fix: pin each section to the configured height (750px) via a ready
event handler, then let overflow-y:auto do its job. Adds a subtle
dark-themed scrollbar so it's unobtrusive during presentation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 29, 2026
- Replace the three single-technology backend boxes with three
  labeled backend groups (Storage, Vector Store, Embedder), each
  showing the default option and all supported alternatives
- Remove 'goose migrations / Schema Manager' box — internal
  implementation detail with no meaning to an audience
- Memory Server box now shows the MCP transport badge and lists
  tools with a brief description of each (memory_remember,
  memory_search, memory_crystallize, resources/list)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 29, 2026
The manage phase (flag, update, consolidate, crystallize) is the
most differentiated part of the design — the research explicitly
calls it the most neglected dimension. It deserves equal billing
from the very first slide.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 29, 2026
Shows current tag-based workaround vs the planned namespace isolation
approach, with a three-tier diagram (global → team → project). Explains
that the proxy stamps the namespace from OIDC context so agents never
set it themselves — one schema migration, no API surface change.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 30, 2026
Placed between the research sources slide and the comparison table
so the tensions motivate the design choices that follow. Each row
shows the tension name, the competing forces, and where ToolHive
Memory lands — Utility/Efficiency, Faithfulness/Adaptivity,
Adaptivity/Stability, Efficiency, and Governance.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 30, 2026
reyortiz3 and others added 2 commits April 30, 2026 16:02
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XL Extra large PR: 1000+ lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant