Agentic Memory Research

Research collection on agent memory architectures, persistence patterns, and output quality maintenance for LLM-based agent systems.

Citation

If you reference this repo’s summaries/analyses in academic or professional work, please cite:

@misc{lin_agentic_memory_2026,
  author       = {Leonard Lin},
  title        = {agentic-memory: Agentic Memory Research Collection (Summaries and Analyses)},
  year         = {2026},
  howpublished = {GitHub repository},
  url          = {https://github.com/lhl/agentic-memory},
}

Reference Summaries

Document	Author	Description
jumperz-agent-memory-stack	@jumperz	31-piece memory architecture split across 3 phases (Core → Reliability → Intelligence). Complete prompt/spec breakdowns for write pipeline, read pipeline, decay, knowledge graph, episodic memory, trust scoring, echo/fizzle feedback loops. The foundational reference that others build on.
joelhooks-adr-0077-memory-system-next-phase	@joelhooks	ADR for joelclaw (personal AI Mac Mini). Maps existing production system (~6 days running, Qdrant 1,343 points) against jumperz's 31 pieces. Plans 3 increments: retrieval quality (score decay, query rewriting), storage quality (dedup, nightly maintenance), feedback loop (echo/fizzle). Includes detailed gap analysis.
coolmanns-openclaw-memory-architecture	coolmanns	12-layer production memory stack for OpenClaw with 14 agents. SQLite+FTS5 knowledge graph (3,108 facts), llama.cpp GPU embeddings (768d, 7ms), three runtime plugins (continuity, stability, graph-memory). 100% recall on 60-query benchmark. Includes activation/decay system, domain RAG, session boot sequences.
drag88-agent-output-degradation	@drag88 (Aswin)	"Why Your Agent's Output Gets Worse Over Time" — multi-agent convergence problem. 4-tier memory (working → episodic → semantic → procedural). 3-layer enforcement pipeline (YAML regex → Gemini LLM judge → self-learning loop). Core insight: convert expensive runtime LLM checks into free static regex rules over time.
versatly-clawvault	Versatly (@drag88)	ClawVault npm CLI tool — structured markdown memory vault with observation pipeline, knowledge graph, session lifecycle (wake/sleep/checkpoint), task/project primitives, Obsidian integration, OpenClaw hooks. 449+ tests. v2.6.1.
vstorm-memv	vstorm-co	memv (PyPI: `memvee`) — Nemori-inspired predict-calibrate extraction + episode segmentation, plus Graphiti-style bi-temporal validity and hybrid retrieval (sqlite-vec + FTS5 + RRF) on SQLite.
supermemory	Dhravya Shah / supermemoryai	Supermemory memory-as-a-service API: memory versioning (linked-list chains), typed relationships (`updates`/`extends`/`derives`), static/dynamic profile synthesis, time-based forgetting with reason tracking, multi-model embedding storage. Critical caveat: open-source repo is frontend/SDK only; core engine is proprietary backend at `api.supermemory.ai`.

Paper Reference Summaries (Academic / Industry)

Document	Author	Description
hu-evermembench	Hu et al.	EverMemBench benchmark for >1M-token multi-party, multi-group interleaved conversations; diagnoses multi-hop collapse, temporal/versioning difficulty, and retrieval-bottlenecked “memory awareness”.
zhang-live-evo	Zhang et al.	Live-Evo: online self-evolving agent memory with an experience bank + meta-guideline bank, contrastive “memory-on vs memory-off” feedback, and weight-based reinforcement/forgetting; evaluated on Prophet Arena + deep research (as reported).
shutova-structmemeval	Shutova et al.	StructMemEval benchmark for whether agents can organize memory into useful structures (trees/ledgers/state tracking), not just retrieve facts; includes hint vs no-hint evaluation to isolate “structure recognition” failures.
yan-gam	Yan et al.	GAM: just-in-time agent memory via lightweight memos + a universal page-store, plus a deep-research researcher that plans/searches/integrates/reflects over history to compile optimized context at runtime; strong long-context QA gains with higher latency (as reported).
yang-graph-based-agent-memory-taxonomy	Yang et al.	Graph-based Agent Memory survey: graph-centric taxonomy + lifecycle (extract/store/retrieve/evolve), storage structures (KG/temporal/hyper/hierarchical/hybrid), retrieval operators, evolution/maintenance, and resources/benchmarks; useful shared vocabulary for shisad.
zhang-survey-memory-mechanism	Zhang et al.	Survey on memory mechanisms for LLM agents: definitions, why memory, design axes (sources/forms/ops), evaluation approaches, and application domains; good baseline checklist alongside newer benchmarks/systems.
hu-memory-age-ai-agents	Hu et al.	Memory in the Age of AI Agents survey: proposes unified lenses of forms (token/parametric/latent), functions (factual/experiential/working), and dynamics (formation/evolution/retrieval), plus benchmarks/frameworks and trustworthiness frontiers.
li-locomoplus	Li et al.	LoCoMo-Plus: evaluates beyond-factual “cognitive memory” (latent constraints like state/goals/values) under cue–trigger semantic disconnect, using constraint-consistency + LLM-judge evaluation.
maharana-locomo	Maharana et al.	LoCoMo dataset + benchmark for very long-term multi-session conversations (300 turns, multimodal) grounded in personas + temporal event graphs; evaluates QA + event summarization + multimodal generation.
wu-longmemeval	Wu et al.	LongMemEval benchmark + design decomposition (indexing → retrieval → reading) and system optimizations (value granularity, key expansion, time-aware query expansion).
packer-memgpt	Packer et al.	MemGPT: OS-inspired hierarchical memory + paging between a fixed-context LLM prompt and external stores (recall + archival), with function-call memory ops and event-driven control flow; foundational baseline for external agent memory.
chhikara-mem0	Chhikara et al.	Mem0: production-oriented long-term memory pipeline with explicit ops (ADD/UPDATE/DELETE/NOOP) and an optional graph memory variant; reports quality + token/latency tradeoffs on LoCoMo.
liu-simplemem	Liu et al.	SimpleMem: write-time semantic structured compression + online synthesis + intent-aware retrieval planning (multi-view dense/BM25/symbolic retrieval with union+dedup) to improve LoCoMo/LongMemEval quality while cutting token cost (as reported).
xu-a-mem	Xu et al.	A‑Mem: Zettelkasten-inspired note network with LLM-driven link generation and “memory evolution” (updating older note attributes as new evidence arrives); strong LoCoMo multi-hop/temporal gains with far lower token lengths than full-context (as reported).
salama-meminsight	Salama et al.	MemInsight: autonomous memory augmentation that mines/annotates attributes (entity-centric + conversation-centric; turn/session granularity) and uses attribute-guided retrieval; large LoCoMo retrieval recall gains vs DPR RAG baseline (as reported).
rasmussen-zep	Rasmussen et al.	Zep: production memory layer built on Graphiti, a bi-temporal knowledge graph (episodes → entities/facts → communities) with validity intervals and invalidation-based corrections; evaluated on DMR + LongMemEval.
nan-nemori	Nan et al.	Nemori: cognitively-inspired self-organizing agent memory with semantic episode boundary detection + episodic narratives and a predict-calibrate loop that distills semantic knowledge from prediction gaps; strong LoCoMo + LongMemEvalS results (as reported).
li-memos	Li et al.	MemOS: OS-like memory control plane with MemCube (payload+metadata), lifecycle/scheduling, governance (ACL/TTL/audit), and multi-substrate memory (plaintext/activation/KV/parameter/LoRA).
yan-memory-r1	Yan et al.	Memory-R1: reinforcement-learned memory manager (ADD/UPDATE/DELETE/NOOP) + answer agent with learned memory distillation; data-efficient RL (PPO/GRPO) training with exact-match reward.
jonelagadda-mnemosyne	Jonelagadda et al.	Mnemosyne: edge-friendly graph memory with substance/redundancy filters, probabilistic recall with decay/refresh, and a fixed-budget “core summary” for persona-level context.
patel-engram	Patel et al.	ENGRAM: lightweight typed memory (episodic/semantic/procedural) with simple dense retrieval + strict evidence budgets; strong LoCoMo + LongMemEval results with low token usage.
wei-evo-memory	Wei et al.	Evo-Memory: streaming benchmark + framework for self-evolving memory and experience reuse; introduces ExpRAG and ReMem (Think/Act/Refine) baselines and robustness/efficiency metrics.
cao-remember-me-refine-me	Cao et al.	ReMe: dynamic procedural memory lifecycle (acquire→reuse→refine) with multi-faceted distillation from success/failure trajectories, scenario-aware retrieval, and utility-based pruning; strong BFCL‑V3/AppWorld results (as reported).
sarin-memoria	Sarin et al.	Memoria: personalization memory layer combining session summaries + KG triplets (persona) with exponential recency weighting; SQLite + ChromaDB architecture and LongMemEvals subset results.
latimer-hindsight	Latimer et al.	Hindsight: retain/recall/reflect architecture separating evidence vs beliefs vs summaries; temporal+entity memory graph with multi-channel retrieval fusion and belief confidence updates; very strong LongMemEval/LoCoMo results (as reported).
yu-agentic-memory	Yu et al.	AgeMem: RL-trained unified LTM+STM controller exposing memory ops as tool actions (add/update/delete/retrieve/summarize/filter) with a 3-stage curriculum and step-wise GRPO for credit assignment.
hu-evermemos	Hu et al.	EverMemOS: self-organizing “memory OS” with MemCells→MemScenes lifecycle, user profile consolidation, and necessity/sufficiency-guided recollection (verifier + query rewrite); strong LoCoMo/LongMemEval results (as reported).
li-timem	Li et al.	TiMem: temporal-hierarchical memory consolidation (segment→session→day→week→profile) with query-complexity recall planning + gating; strong LoCoMo/LongMemEval-S accuracy with low recalled tokens (as reported).
zhang-himem	Zhang et al.	HiMem: hierarchical long-term memory split (Episode Memory + Note Memory) with topic+surprise episode segmentation, note-first “best-effort” retrieval w/ sufficiency checks, and conflict-aware reconsolidation; strong LoCoMo results (as reported).
behrouz-nested-learning	Behrouz et al.	Nested Learning / CMS / Hope: reframes memory as multi-timescale update dynamics (continuum memory blocks updated at different frequencies) with implications for consolidation and “corrections without forgetting”.
zhang-recursive-language-models	Zhang et al.	Recursive Language Models (RLMs): inference-time recursion + REPL state treats long prompts as an external environment; processes multi‑million-token inputs with sub-calls and programmatic slicing, often beating long-context scaffolds at comparable average cost (as reported).
wang-m-plus	Wang et al.	M+: latent-space long-term memory extension to MemoryLLM that stores dropped memory tokens in an LTM pool and retrieves them during generation with a co-trained retriever; extends retention to >160k tokens at similar GPU memory cost (as reported).
dong-minja	Dong et al.	MINJA: practical memory injection attack on “memory-as-demonstrations” agents via query-only interaction (bridging steps + progressive shortening); motivates write-time gates, isolation, and safer memory representations.
sunil-memory-poisoning-attack-defense	Sunil et al.	Memory poisoning attack & defense: empirical MINJA follow-up in EHR agents; shows pre-existing benign memory can reduce ASR, and that trust-score defenses can fail via over-conservatism or overconfidence.
anokhin-arigraph	Anokhin et al.	AriGraph: knowledge-graph world model that links episodic observation nodes to extracted semantic triplets; two-stage retrieval (semantic→episodic) for planning/exploration in text-game environments.
behrouz-titans	Behrouz et al.	Titans: long-context architecture with an online-updated neural memory module (test-time learning) plus persistent task memory; provides explicit primitives for surprise-based salience and forgetting.
ahn-hema	Ahn	HEMA: hippocampus-inspired dual memory for long conversations (running compact summary + FAISS episodic vector store) with explicit prompt budgeting, pruning (“semantic forgetting”), and summary-of-summaries consolidation.
tan-membench	Tan et al.	MemBench: benchmark/dataset for agent memory covering participation vs observation scenarios and factual vs reflective memory, with metrics for accuracy/recall/capacity and read/write-time efficiency.

Deep Dive Analyses

Root-level critical analyses intended for synthesis work. These reference the summaries above, but focus on coherence, evidence quality, risks, and synthesis-ready claim framing.

Synthesis	Based on	Focus
ANALYSIS	`ANALYSIS-*.md` + shisad docs + Mem0/Letta baselines	Cross-system comparison (techniques + memory types), plus mapping to shisad and “traditional” RAG-ish memory
ANALYSIS-academic-industry	paper `ANALYSIS-arxiv-*.md` + shisad plan	Academic/industry synthesis: benchmarks vs systems vs attacks, with “what’s missing in shisad” framing
Benchmarks best practices	Public disputes, audits, our analysis	Known pitfalls, metric confusion, dataset quality issues, per-benchmark limitations
MELT benchmark design	ANALYSIS.md systems + Reality Check epistemic docs	Memory Evaluation for Lifecycle Testing — session-replay benchmark testing full memory lifecycle (decay, consolidation, contradiction, core stability, inference) at 6 scale tiers over simulated time. Separate repo; draft.

Analysis	Based on	Focus
ANALYSIS-jumperz-agent-memory-stack	`references/jumperz-agent-memory-stack.md`	Checklist critique (semantics, failure modes, missing evaluation), synthesis-ready takeaways + claims table
ANALYSIS-joelhooks-adr-0077-memory-system-next-phase	`references/joelhooks-adr-0077-memory-system-next-phase.md`	Increment plan critique (decay, rewrite, dedup, echo/fizzle), validation plan + claims
ANALYSIS-coolmanns-openclaw-memory-architecture	`references/coolmanns-openclaw-memory-architecture.md` + `vendor/openclaw-memory-architecture/`	Layered stack critique with benchmark-method verification, operational risks, doc drift notes
ANALYSIS-drag88-agent-output-degradation	`references/drag88-agent-output-degradation.md`	Convergence + enforcement pattern critique (judge→rule distillation), measurement gaps, risks
ANALYSIS-versatly-clawvault	`references/versatly-clawvault.md` + `vendor/clawvault/`	Product/tooling critique (surface area, hooks, qmd dependency), security posture, missing benchmarks
ANALYSIS-vstorm-memv	`references/vstorm-memv.md` + `vendor/memv/`	Implementation critique of Nemori-inspired predict-calibrate extraction + bi-temporal validity + hybrid retrieval, with gaps/risks and shisad mapping
ANALYSIS-openviking	`vendor/openviking/` + Hermes provider docs	Open-source context database: `viking://` filesystem, L0/L1/L2 tiered loading, session-commit extraction across 8 memory categories, and hierarchical typed retrieval over memory/resources/skills; strong observability with heavier operational complexity
ANALYSIS-byterover-cli	`vendor/byterover-cli/` + `vendor/byterover-cli/paper/`	Agent-native coding-agent memory/runtime: daemon + per-project agent pool, markdown context tree with explicit relations and lifecycle, 5-tier progressive retrieval with cache/OOD detection, and strong self-reported benchmarks with caveats
ANALYSIS-mira-OSS	`vendor/mira-OSS/`	Full-stack event-driven agent (v1 rev 2): activity-day sigmoid decay, hub discovery + 3-axis linking (vector+entity+TF-IDF), Text-Based LoRA + user model synthesis with critic validation, background forage agent (sub-agent collaboration), portrait synthesis, 16 tools, context overflow remediation, immutable domain models, multi-user RLS + Vault; gaps in write gating, external benchmarks, taint tracking, and sub-agent capability scoping
ANALYSIS-claude-code-memory	Source: `/home/lhl/Downloads/claude-code/src`	Claude Code memory subsystem (Anthropic): first-party production-scale memory system; flat-file MEMORY.md + typed topic files (user/feedback/project/reference) + background extraction via forked agent with mutual exclusion + LLM-based relevance selection (Sonnet) + team memory with OAuth sync + auto dream consolidation + KAIROS daily-log mode + eval-validated prompts with case IDs + security-hardened path validation; no vector search, no graph, no decay scoring
ANALYSIS-codex-memory	openai/codex	Codex memory subsystem (OpenAI): first-party open-source coding agent; two-phase async pipeline (gpt-5.1-codex-mini extraction → gpt-5.3-codex consolidation) + SQLite-backed job coordination (leases/heartbeats/watermarks) + progressive disclosure layout (memory_summary → MEMORY.md → rollout_summaries → skills) + skills as procedural memory + usage-based citation-driven retention + thread-diff incremental forgetting + ~1,400 lines extraction/consolidation prompts; no vector search, no team memory, no real-time extraction
ANALYSIS-google-always-on-memory-agent	`vendor/always-on-memory-agent/`	Official Google ADK sample: always-on daemon with multimodal ingestion (27 file types via Gemini 3.1 Flash-Lite), periodic LLM consolidation, SQLite storage, HTTP API + Streamlit dashboard; no retrieval/search (recency scan LIMIT 50), no decay/dedup/versioning; useful as ADK orchestration reference and multimodal ingestion pattern
ANALYSIS-supermemory	`references/supermemory.md` + `vendor/supermemory/`	Memory-as-a-service startup: memory versioning (linked-list chains via parentMemoryId/rootMemoryId/isLatest), typed relationship ontology (updates/extends/derives), static/dynamic profile synthesis API, time-based forgetting with audit trail, multi-model embedding columns, MemoryBench framework; open-source repo is SDK/frontend only — core engine logic is proprietary hosted backend
ANALYSIS-karta	`vendor/karta/`	Karta (rohithzr): Rust (~10.4K LOC) agentic memory library with Zettelkasten-inspired knowledge graph, 7-type dream engine (deduction/induction/abduction/consolidation/contradiction/episode digest/cross-episode digest) with inference feedback into retrieval, embedding-based query classification (6 modes), retroactive context evolution with drift protection, cross-encoder reranking with abstention, multi-hop BFS traversal, atomic fact decomposition with per-fact embeddings, foresight signals with TTL, structured episode digests; BEAM 100K: 57.7% with 243-failure root cause catalog

Paper Deep Dive Analyses (Academic / Industry)

Analysis	Based on	Focus
ANALYSIS-arxiv-2602.01313-evermembench	`references/hu-evermembench.md` + `references/papers/arxiv-2602.01313.pdf`	Benchmark critique emphasizing version semantics, multi-party fragmentation, oracle diagnostics, and shisad mapping
ANALYSIS-arxiv-2602.02369-live-evo	`references/zhang-live-evo.md` + `references/papers/arxiv-2602.02369.pdf`	System deep dive emphasizing online experience weighting from continuous feedback, meta-guidelines for memory compilation, and memory-on vs memory-off utility measurement; shisad mapping for feedback loops + procedural memory gating
ANALYSIS-arxiv-2602.11243-structmemeval	`references/shutova-structmemeval.md` + `references/papers/arxiv-2602.11243.pdf`	Benchmark deep dive emphasizing memory organization/structure as a distinct capability (trees/ledgers/state), hint vs no-hint diagnostics, and implications for shisad structured-memory primitives
ANALYSIS-arxiv-2602.05665-graph-based-agent-memory-taxonomy	`references/yang-graph-based-agent-memory-taxonomy.md` + `references/papers/arxiv-2602.05665.pdf`	Survey deep dive providing graph-based memory taxonomy and lifecycle (extract/store/retrieve/evolve), with implications for shisad graph-as-derived-view, operator choices, and maintenance jobs
ANALYSIS-arxiv-2404.13501-survey-memory-mechanism	`references/zhang-survey-memory-mechanism.md` + `references/papers/arxiv-2404.13501.pdf`	Survey deep dive providing baseline taxonomy and evaluation checklists for agent memory; useful coverage reference alongside newer benchmarks/systems for shisad’s roadmap
ANALYSIS-arxiv-2512.13564-memory-age-ai-agents	`references/hu-memory-age-ai-agents.md` + `references/papers/arxiv-2512.13564.pdf`	Survey deep dive emphasizing the Forms–Functions–Dynamics taxonomy and frontiers (RL integration, multimodal, multi-agent shared memory, trustworthiness), used as organizing frame for shisad v0.7 memory roadmap
ANALYSIS-arxiv-2402.17753-locomo	`references/maharana-locomo.md` + `references/papers/arxiv-2402.17753.pdf`	Dataset/benchmark critique with episodic-memory implications (event graphs, multimodal, RAG harm) and shisad mapping
ANALYSIS-arxiv-2410.10813-longmemeval	`references/wu-longmemeval.md` + `references/papers/arxiv-2410.10813.pdf`	Benchmark and system-design decomposition (indexing/retrieval/reading), with mapping to shisad primitives
ANALYSIS-arxiv-2310.08560-memgpt	`references/packer-memgpt.md` + `references/papers/arxiv-2310.08560.pdf`	System deep dive emphasizing virtual context management (OS paging), memory tiers (working/queue/recall/archival), function-call memory ops, and implications for shisad versioned corrections + write-policy hardening
ANALYSIS-arxiv-2602.10715-locomoplus	`references/li-locomoplus.md` + `references/papers/arxiv-2602.10715.pdf`	Beyond-factual “cognitive memory” benchmark critique (latent constraints) and implications for safe constraint/procedural memory
ANALYSIS-arxiv-2504.19413-mem0	`references/chhikara-mem0.md` + `references/papers/arxiv-2504.19413.pdf`	System deep dive emphasizing explicit memory ops, graph-memory tradeoffs, deployment metrics (tokens/p95), and shisad mapping (versioned corrections vs delete)
ANALYSIS-arxiv-2601.02553-simplemem	`references/liu-simplemem.md` + `references/papers/arxiv-2601.02553.pdf`	System deep dive emphasizing write-time semantic structured compression, online consolidation, and intent-aware multi-view retrieval planning; mapping to shisad “derived vs raw” memory + retrieval budgeting
ANALYSIS-arxiv-2502.12110-a-mem	`references/xu-a-mem.md` + `references/papers/arxiv-2502.12110.pdf`	System deep dive emphasizing Zettelkasten-style notes + LLM-driven linking + memory evolution, with strong multi-hop/temporal LoCoMo gains but high versioning/audit requirements for shisad
ANALYSIS-arxiv-2503.21760-meminsight	`references/salama-meminsight.md` + `references/papers/arxiv-2503.21760.pdf`	System deep dive emphasizing autonomous attribute mining/annotation as a derived metadata layer to improve retrieval recall and downstream tasks; mapping to shisad schema constraints + provenance/versioning
ANALYSIS-arxiv-2511.18423-gam	`references/yan-gam.md` + `references/papers/arxiv-2511.18423.pdf`	System deep dive emphasizing just-in-time context compilation via memo index + universal page-store and an iterative deep-research researcher; highlights the latency/quality trade-off and mapping to shisad evidence-first episodic storage
ANALYSIS-arxiv-2501.13956-zep	`references/rasmussen-zep.md` + `references/papers/arxiv-2501.13956.pdf`	System deep dive emphasizing bi-temporal validity semantics, episodic+semantic+community graph tiers, hybrid retrieval (BM25/embeddings/BFS), and implications for shisad versioned memory
ANALYSIS-arxiv-2507.03724-memos	`references/li-memos.md` + `references/papers/arxiv-2507.03724.pdf`	System deep dive emphasizing MemCube metadata, multi-substrate memory (plaintext/KV/parameter), lifecycle/scheduling/governance, and mapping to shisad primitives
ANALYSIS-arxiv-2508.19828-memory-r1	`references/yan-memory-r1.md` + `references/papers/arxiv-2508.19828.pdf`	RL deep dive emphasizing learned memory ops (ADD/UPDATE/DELETE/NOOP) + post-retrieval memory distillation, reward design, and what’s required to safely adopt this in shisad
ANALYSIS-arxiv-2508.03341-nemori	`references/nan-nemori.md` + `references/papers/arxiv-2508.03341.pdf`	System deep dive emphasizing episode segmentation (Two-Step Alignment) + predict-calibrate semantic distillation, reported LoCoMo/LongMemEvalS gains, and implications for shisad write gating + correction semantics
ANALYSIS-arxiv-2510.08601-mnemosyne	`references/jonelagadda-mnemosyne.md` + `references/papers/arxiv-2510.08601.pdf`	System deep dive emphasizing edge-first graph memory, redundancy/refresh, probabilistic decay-based recall, and a fixed-budget core/persona summary; includes evaluation-rigor cautions
ANALYSIS-arxiv-2511.12960-engram	`references/patel-engram.md` + `references/papers/arxiv-2511.12960.pdf`	System deep dive emphasizing typed memory (episodic/semantic/procedural), deterministic routing/formatting, strict evidence budgets, and strong token/latency results; mapping to shisad primitives
ANALYSIS-arxiv-2511.20857-evo-memory	`references/wei-evo-memory.md` + `references/papers/arxiv-2511.20857.pdf`	Benchmark deep dive emphasizing streaming task-sequence evaluation for experience reuse, plus refine/prune mechanisms and metrics (robustness, step efficiency) for shisad’s eval harness
ANALYSIS-arxiv-2512.10696-remember-me-refine-me	`references/cao-remember-me-refine-me.md` + `references/papers/arxiv-2512.10696.pdf`	System deep dive emphasizing procedural memory distillation + scenario-aware reuse + utility-based refinement/pruning; mapping to shisad procedural tier + versioned invalidation vs delete
ANALYSIS-arxiv-2512.12686-memoria	`references/sarin-memoria.md` + `references/papers/arxiv-2512.12686.pdf`	System deep dive emphasizing persona KG + session summaries with recency-weighted retrieval; highlights missing governance/versioning primitives needed for shisad
ANALYSIS-arxiv-2512.12818-hindsight	`references/latimer-hindsight.md` + `references/papers/arxiv-2512.12818.pdf`	System deep dive emphasizing retain/recall/reflect with four-network memory (facts/experiences/observations/beliefs), token-budgeted multi-channel retrieval fusion, and belief confidence updates; key shisad mapping
ANALYSIS-arxiv-2601.01885-agentic-memory	`references/yu-agentic-memory.md` + `references/papers/arxiv-2601.01885.pdf`	RL deep dive emphasizing unified LTM+STM memory ops as tool actions, 3-stage training curriculum, step-wise GRPO credit assignment, and implications for shisad’s future learned memory policies
ANALYSIS-arxiv-2601.02163-evermemos	`references/hu-evermemos.md` + `references/papers/arxiv-2601.02163.pdf`	System deep dive emphasizing MemCell→MemScene consolidation lifecycle, user profile/foresight, and sufficiency-verified scene-guided retrieval; mapping to shisad consolidation roadmap
ANALYSIS-arxiv-2601.02845-timem	`references/li-timem.md` + `references/papers/arxiv-2601.02845.pdf`	System deep dive emphasizing temporal-hierarchical consolidation (TMT), query-complexity recall planning/gating, and the accuracy–token frontier; mapping to shisad temporal tiers
ANALYSIS-arxiv-2601.06377-himem	`references/zhang-himem.md` + `references/papers/arxiv-2601.06377.pdf`	System deep dive emphasizing Episode Memory + Note Memory hierarchy, note-first “best-effort” retrieval w/ sufficiency checks, and conflict-aware reconsolidation; mapping to shisad event→knowledge tiers + versioned updates
ANALYSIS-arxiv-2512.24695-nested-learning	`references/behrouz-nested-learning.md` + `references/papers/arxiv-2512.24695.pdf`	Conceptual deep dive on multi-timescale “continuum memory” and consolidation dynamics; mapping to shisad tiered memory + versioned corrections
ANALYSIS-arxiv-2512.24601-recursive-language-models	`references/zhang-recursive-language-models.md` + `references/papers/arxiv-2512.24601.pdf`	Architecture deep dive emphasizing RLM-style programmatic reading/compilation over arbitrarily long evidence stores (REPL + recursion + sub-calls), with implications for shisad sandboxed compilation traces and cost tail management
ANALYSIS-arxiv-2502.00592-m-plus	`references/wang-m-plus.md` + `references/papers/arxiv-2502.00592.pdf`	Architecture deep dive emphasizing latent-space long-term memory tokens + co-trained retrieval for >160k retention, with mapping to shisad’s external evidence-first memory and retrieval diagnostics
ANALYSIS-arxiv-2503.03704-minja	`references/dong-minja.md` + `references/papers/arxiv-2503.03704.pdf`	Security deep dive on query-only memory injection attacks; implications for write-policy, provenance/taint, isolation, and “don’t store demonstrations” patterns
ANALYSIS-arxiv-2601.05504-memory-poisoning-attack-defense	`references/sunil-memory-poisoning-attack-defense.md` + `references/papers/arxiv-2601.05504.pdf`	Security deep dive emphasizing ISR vs ASR under realistic memory conditions, and why trust-score sanitization can fail; concrete shisad hardening takeaways
ANALYSIS-arxiv-2407.04363-arigraph	`references/anokhin-arigraph.md` + `references/papers/arxiv-2407.04363.pdf`	System deep dive emphasizing episodic↔semantic memory linking, graph-structured retrieval for planning/exploration, and implications for shisad episode objects + provenance + correction semantics
ANALYSIS-arxiv-2501.00663-titans	`references/behrouz-titans.md` + `references/papers/arxiv-2501.00663.pdf`	Architecture deep dive emphasizing test-time-learning neural memory (surprise/momentum/forgetting), Titans MAC/MAG/MAL variants, and how to translate salience/decay ideas into shisad’s external memory framework
ANALYSIS-arxiv-2504.16754-hema	`references/ahn-hema.md` + `references/papers/arxiv-2504.16754.pdf`	System deep dive emphasizing dual memory (summary + vector store), explicit prompt budgeting, pruning/consolidation policies, and evaluation-rigor cautions for shisad adoption
ANALYSIS-arxiv-2506.21605-membench	`references/tan-membench.md` + `references/papers/arxiv-2506.21605.pdf`	Benchmark deep dive emphasizing multi-scenario (participant vs observer) and multi-level (factual vs reflective) evaluation, plus latency/capacity metrics and implications for shisad eval harnesses

Source Threads & Links

Source	URL
@jumperz memory stack thread	https://x.com/jumperz/status/2024841165774717031
@joelhooks ADR tweet	https://x.com/joelhooks/status/2024947701738262773
joelclaw ADR-0077	https://joelclaw.com/adrs/0077-memory-system-next-phase
@drag88 article	https://x.com/drag88/status/2022551759491862974
supermemory docs	https://supermemory.ai/docs
supermemory repo	https://github.com/supermemoryai/supermemory
mempalace repo	https://github.com/milla-jovovich/mempalace
karta repo	https://github.com/rohithzr/karta

File Tree

agentic-memory/
├── README.md                          ← this file
├── ANALYSIS.md                         ← synthesis + comparison
├── ANALYSIS-academic-industry.md       ← academic/industry synthesis
├── ANALYSIS-jumperz-agent-memory-stack.md
├── ANALYSIS-joelhooks-adr-0077-memory-system-next-phase.md
├── ANALYSIS-coolmanns-openclaw-memory-architecture.md
├── ANALYSIS-drag88-agent-output-degradation.md
├── ANALYSIS-versatly-clawvault.md
├── ANALYSIS-vstorm-memv.md
├── ANALYSIS-mira-OSS.md
├── ANALYSIS-codex-memory.md
├── ANALYSIS-google-always-on-memory-agent.md
├── ANALYSIS-supermemory.md
├── ANALYSIS-karta.md               ← Karta: Rust agentic memory library with dream engine
├── ANALYSIS-mempalace.md           ← not in ANALYSIS.md (claims-vs-code issues); see REVIEWED.md
├── REVIEWED.md                        ← triage log (examined but not promoted to ANALYSIS)
├── PUNCHLIST-academic-industry.md     ← tracking checklist for paper deep dives
├── templates/                         ← templates for paper analyses/summaries
│
├── references/                        ← summarized reference docs (markdown w/ frontmatter)
│   ├── 1-full-agent-memory-build.jpg  ← jumperz card 1: memory storage
│   ├── 2-feeds-into.jpg               ← jumperz card 2: memory intelligence
│   ├── jumperz-agent-memory-stack.md
│   ├── joelhooks-adr-0077-memory-system-next-phase.md
│   ├── coolmanns-openclaw-memory-architecture.md
│   ├── drag88-agent-output-degradation.md
│   └── versatly-clawvault.md
│   ├── hu-evermembench.md
│   ├── li-locomoplus.md
│   ├── maharana-locomo.md
│   ├── wu-longmemeval.md
│   ├── chhikara-mem0.md
│   └── papers/                        ← archived PDFs + text snapshots
│       ├── README.md
│       ├── arxiv-*.pdf
│       └── arxiv-*.md
│
└── vendor/                            ← cloned source repos
    ├── mira-OSS/                      ← github.com/taylorsatula/mira-OSS (snapshot, AGPLv3)
    │   ├── README.md
    │   ├── CLAUDE.md                  ← project guide (architecture, patterns, principles)
    │   ├── main.py                    ← FastAPI entry point
    │   ├── cns/                       ← Central Nervous System (conversation orchestration)
    │   │   ├── api/                   ← FastAPI endpoints (chat, actions, data, health)
    │   │   ├── core/                  ← Domain models (Continuum, Message, Events)
    │   │   ├── services/              ← Orchestrator, subcortical, summary, collapse handler
    │   │   └── infrastructure/        ← Repositories, Valkey cache, unit of work
    │   ├── lt_memory/                 ← Long-term memory system
    │   │   ├── scoring_formula.sql    ← Multi-factor activity-day sigmoid importance scoring
    │   │   ├── models.py             ← Memory, Entity, ExtractedMemory, link types
    │   │   ├── hybrid_search.py      ← BM25 + pgvector with RRF
    │   │   ├── proactive.py          ← Dual-path retrieval (similarity + hub discovery)
    │   │   ├── hub_discovery.py      ← Entity-driven memory retrieval via pg_trgm
    │   │   └── processing/           ← Extraction, consolidation, entity GC pipelines
    │   ├── working_memory/           ← System prompt composition via trinkets
    │   ├── tools/                    ← Self-registering tool framework (11 built-in)
    │   ├── config/                   ← Pydantic config + prompt templates
    │   └── auth/                     ← WebAuthn + magic link authentication
    │
    ├── openclaw-memory-architecture/  ← github.com/coolmanns/openclaw-memory-architecture
    │   ├── README.md
    │   ├── PROJECT.md
    │   ├── CHANGELOG.md
    │   ├── docs/
    │   │   ├── ARCHITECTURE.md        ← full 12-layer technical reference
    │   │   ├── knowledge-graph.md     ← graph search pipeline, benchmarks
    │   │   ├── context-optimization.md
    │   │   ├── embedding-setup.md
    │   │   ├── benchmark-process.md
    │   │   ├── benchmark-results.md
    │   │   ├── code-search.md
    │   │   └── COMPARISON.md
    │   ├── schema/
    │   │   └── facts.sql              ← SQLite schema for knowledge graph
    │   ├── scripts/                   ← init, seed, search, ingest, decay, benchmark, telemetry
    │   ├── templates/                 ← starter files (active-context, gating-policies, etc.)
    │   └── plugin-graph-memory/       ← OpenClaw plugin (JS)
    │
    ├── karta/                         ← github.com/rohithzr/karta (submodule, MIT)
    │   ├── Cargo.toml                ← workspace: karta-core + karta-cli
    │   ├── crates/
    │   │   └── karta-core/           ← Core engine (~6.7K LOC Rust)
    │   │       ├── src/
    │   │       │   ├── note.rs       ← MemoryNote, Provenance, NoteStatus, AtomicFact, Episode, EpisodeDigest
    │   │       │   ├── write.rs      ← Write path: index, link, evolve, foresight, facts
    │   │       │   ├── read.rs       ← Read path: classify, search, traverse, rerank, synthesize
    │   │       │   ├── rerank.rs     ← Jina/LLM/noop rerankers
    │   │       │   ├── dream/        ← Dream engine: 7 inference types
    │   │       │   ├── store/        ← LanceDB + SQLite implementations
    │   │       │   └── llm/          ← Provider trait + OpenAI + mock + prompts
    │   │       └── tests/            ← eval, beam_100k, bench_beam (~3.8K LOC)
    │   ├── findings.md               ← BEAM 100K detailed failure analysis
    │   └── plan.md                   ← Experiment plan targeting 90%+
    │
    ├── always-on-memory-agent/        ← GoogleCloudPlatform/generative-ai (official ADK sample)
    │   ├── agent.py                  ← ADK multi-agent daemon (ingest/consolidate/query)
    │   ├── dashboard.py              ← Streamlit UI
    │   └── docs/                     ← Logo/architecture assets
    │
    ├── memv/                          ← github.com/vstorm-co/memv
    │   ├── README.md
    │   ├── CHANGELOG.md
    │   ├── pyproject.toml             ← PyPI: memvee, v0.1.0
    │   ├── docs/                      ← docs site (MkDocs)
    │   ├── src/
    │   │   └── memv/                  ← segmentation, extraction, validity, retrieval, storage
    │   └── tests/
    │
    ├── supermemory/                    ← github.com/supermemoryai/supermemory (lean subset: schemas, SDK, MCP, arch docs)
    │   ├── LICENSE
    │   ├── README.md                  ← provenance + open-source vs hosted-backend split
    │   ├── packages/
    │   │   ├── validation/            ← Zod schemas (data model definitions)
    │   │   │   ├── schemas.ts
    │   │   │   └── api.ts
    │   │   ├── lib/
    │   │   │   ├── api.ts             ← reveals backend dependency (api.supermemory.ai)
    │   │   │   └── similarity.ts      ← client-side cosine sim (visualization only)
    │   │   └── tools/src/shared/
    │   │       └── memory-client.ts   ← SDK client (profile search, prompt formatting)
    │   ├── apps/mcp/src/
    │   │   └── server.ts              ← MCP server (memory/recall/whoAmI tools)
    │   └── skills/supermemory/references/
    │       └── architecture.md        ← claimed design (558 lines)
    │
    └── clawvault/                     ← github.com/Versatly/clawvault
        ├── README.md
        ├── PLAN.md                    ← issue #4: ledger, reflect, replay, archive
        ├── CHANGELOG.md
        ├── SKILL.md
        ├── package.json               ← npm: clawvault, v2.6.1
        ├── src/
        │   ├── commands/              ← archive, context, inject, observe, reflect, replay, wake, sleep, task, project, ...
        │   ├── observer/              ← compressor, reflector, router, session-watcher
        │   ├── lib/                   ← vault, memory-graph, ledger, observation-format, session-utils
        │   └── cli/
        ├── bin/                       ← CLI entry + command registration modules
        ├── hooks/                     ← OpenClaw hook handler
        ├── dashboard/                 ← web dashboard (vault parser, graph diff)
        ├── schemas/
        ├── scripts/
        ├── templates/
        └── tests/

Key Themes Across Sources

Phased build order matters: Core memory first (write/read/decay), reliability second (dedup/maintenance/recovery), intelligence last (graphs/trust/cross-agent). Building out of order amplifies flaws.
Tiered retrieval: Summary files first (fast, cheap), vector search fallback (thorough, expensive). Don't vector-search everything.
Score decay: final_score = relevance × exp(-λ × days) — recency-weighted relevance is universal across all architectures.
Feedback loops: Echo/fizzle (track which injected memories get used), behavior loops (extract corrections as lessons), learning loops (convert expensive LLM checks into cheap static rules).
SQLite over hosted vector DBs: At current scales (1K-5K entries), SQLite + FTS5 + local embeddings outperforms hosted solutions on latency, cost, and operational simplicity.
Multi-agent convergence: Shared memory creates homogenization pressure. Workspace isolation + file routing guards help but don't fully solve it.
Vault index pattern: Single scannable manifest (one-line descriptions) → load individual entries on demand. One file read instead of N.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agentic Memory Research

Citation

Reference Summaries

Paper Reference Summaries (Academic / Industry)

Deep Dive Analyses

Paper Deep Dive Analyses (Academic / Industry)

Source Threads & Links

File Tree

Key Themes Across Sources

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
benchmarks		benchmarks
references		references
templates		templates
vendor		vendor
.gitattributes		.gitattributes
.gitmodules		.gitmodules
AGENTS.md		AGENTS.md
ANALYSIS-academic-industry.md		ANALYSIS-academic-industry.md
ANALYSIS-arxiv-2310.08560-memgpt.md		ANALYSIS-arxiv-2310.08560-memgpt.md
ANALYSIS-arxiv-2402.17753-locomo.md		ANALYSIS-arxiv-2402.17753-locomo.md
ANALYSIS-arxiv-2404.13501-survey-memory-mechanism.md		ANALYSIS-arxiv-2404.13501-survey-memory-mechanism.md
ANALYSIS-arxiv-2407.04363-arigraph.md		ANALYSIS-arxiv-2407.04363-arigraph.md
ANALYSIS-arxiv-2410.10813-longmemeval.md		ANALYSIS-arxiv-2410.10813-longmemeval.md
ANALYSIS-arxiv-2501.00663-titans.md		ANALYSIS-arxiv-2501.00663-titans.md
ANALYSIS-arxiv-2501.13956-zep.md		ANALYSIS-arxiv-2501.13956-zep.md
ANALYSIS-arxiv-2502.00592-m-plus.md		ANALYSIS-arxiv-2502.00592-m-plus.md
ANALYSIS-arxiv-2502.12110-a-mem.md		ANALYSIS-arxiv-2502.12110-a-mem.md
ANALYSIS-arxiv-2503.03704-minja.md		ANALYSIS-arxiv-2503.03704-minja.md
ANALYSIS-arxiv-2503.21760-meminsight.md		ANALYSIS-arxiv-2503.21760-meminsight.md
ANALYSIS-arxiv-2504.16754-hema.md		ANALYSIS-arxiv-2504.16754-hema.md
ANALYSIS-arxiv-2504.19413-mem0.md		ANALYSIS-arxiv-2504.19413-mem0.md
ANALYSIS-arxiv-2506.21605-membench.md		ANALYSIS-arxiv-2506.21605-membench.md
ANALYSIS-arxiv-2507.03724-memos.md		ANALYSIS-arxiv-2507.03724-memos.md
ANALYSIS-arxiv-2508.03341-nemori.md		ANALYSIS-arxiv-2508.03341-nemori.md
ANALYSIS-arxiv-2508.19828-memory-r1.md		ANALYSIS-arxiv-2508.19828-memory-r1.md
ANALYSIS-arxiv-2510.08601-mnemosyne.md		ANALYSIS-arxiv-2510.08601-mnemosyne.md
ANALYSIS-arxiv-2511.12960-engram.md		ANALYSIS-arxiv-2511.12960-engram.md
ANALYSIS-arxiv-2511.18423-gam.md		ANALYSIS-arxiv-2511.18423-gam.md
ANALYSIS-arxiv-2511.20857-evo-memory.md		ANALYSIS-arxiv-2511.20857-evo-memory.md
ANALYSIS-arxiv-2512.10696-remember-me-refine-me.md		ANALYSIS-arxiv-2512.10696-remember-me-refine-me.md
ANALYSIS-arxiv-2512.12686-memoria.md		ANALYSIS-arxiv-2512.12686-memoria.md
ANALYSIS-arxiv-2512.12818-hindsight.md		ANALYSIS-arxiv-2512.12818-hindsight.md
ANALYSIS-arxiv-2512.13564-memory-age-ai-agents.md		ANALYSIS-arxiv-2512.13564-memory-age-ai-agents.md
ANALYSIS-arxiv-2512.24601-recursive-language-models.md		ANALYSIS-arxiv-2512.24601-recursive-language-models.md
ANALYSIS-arxiv-2512.24695-nested-learning.md		ANALYSIS-arxiv-2512.24695-nested-learning.md
ANALYSIS-arxiv-2601.01885-agentic-memory.md		ANALYSIS-arxiv-2601.01885-agentic-memory.md
ANALYSIS-arxiv-2601.02163-evermemos.md		ANALYSIS-arxiv-2601.02163-evermemos.md
ANALYSIS-arxiv-2601.02553-simplemem.md		ANALYSIS-arxiv-2601.02553-simplemem.md
ANALYSIS-arxiv-2601.02845-timem.md		ANALYSIS-arxiv-2601.02845-timem.md
ANALYSIS-arxiv-2601.05504-memory-poisoning-attack-defense.md		ANALYSIS-arxiv-2601.05504-memory-poisoning-attack-defense.md
ANALYSIS-arxiv-2601.06377-himem.md		ANALYSIS-arxiv-2601.06377-himem.md
ANALYSIS-arxiv-2602.01313-evermembench.md		ANALYSIS-arxiv-2602.01313-evermembench.md
ANALYSIS-arxiv-2602.02369-live-evo.md		ANALYSIS-arxiv-2602.02369-live-evo.md
ANALYSIS-arxiv-2602.05665-graph-based-agent-memory-taxonomy.md		ANALYSIS-arxiv-2602.05665-graph-based-agent-memory-taxonomy.md
ANALYSIS-arxiv-2602.10715-locomoplus.md		ANALYSIS-arxiv-2602.10715-locomoplus.md
ANALYSIS-arxiv-2602.11243-structmemeval.md		ANALYSIS-arxiv-2602.11243-structmemeval.md
ANALYSIS-byterover-cli.md		ANALYSIS-byterover-cli.md
ANALYSIS-claude-code-memory.md		ANALYSIS-claude-code-memory.md
ANALYSIS-codex-memory.md		ANALYSIS-codex-memory.md
ANALYSIS-coolmanns-openclaw-memory-architecture.md		ANALYSIS-coolmanns-openclaw-memory-architecture.md
ANALYSIS-drag88-agent-output-degradation.md		ANALYSIS-drag88-agent-output-degradation.md
ANALYSIS-google-always-on-memory-agent.md		ANALYSIS-google-always-on-memory-agent.md
ANALYSIS-joelhooks-adr-0077-memory-system-next-phase.md		ANALYSIS-joelhooks-adr-0077-memory-system-next-phase.md
ANALYSIS-jumperz-agent-memory-stack.md		ANALYSIS-jumperz-agent-memory-stack.md
ANALYSIS-karta.md		ANALYSIS-karta.md
ANALYSIS-mempalace.md		ANALYSIS-mempalace.md
ANALYSIS-mira-OSS.md		ANALYSIS-mira-OSS.md
ANALYSIS-openviking.md		ANALYSIS-openviking.md
ANALYSIS-supermemory.md		ANALYSIS-supermemory.md
ANALYSIS-versatly-clawvault.md		ANALYSIS-versatly-clawvault.md
ANALYSIS-vstorm-memv.md		ANALYSIS-vstorm-memv.md
ANALYSIS.md		ANALYSIS.md
CLAUDE.md		CLAUDE.md
PUNCHLIST-academic-industry.md		PUNCHLIST-academic-industry.md
README.md		README.md
REVIEWED.md		REVIEWED.md

Folders and files

Latest commit

History

Repository files navigation

Agentic Memory Research

Citation

Reference Summaries

Paper Reference Summaries (Academic / Industry)

Deep Dive Analyses

Paper Deep Dive Analyses (Academic / Industry)

Source Threads & Links

File Tree

Key Themes Across Sources

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Packages