Skip to content

lhl/agentic-memory

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

76 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Agentic Memory Research

Research collection on agent memory architectures, persistence patterns, and output quality maintenance for LLM-based agent systems.

Citation

If you reference this repo’s summaries/analyses in academic or professional work, please cite:

@misc{lin_agentic_memory_2026,
  author       = {Leonard Lin},
  title        = {agentic-memory: Agentic Memory Research Collection (Summaries and Analyses)},
  year         = {2026},
  howpublished = {GitHub repository},
  url          = {https://github.com/lhl/agentic-memory},
}

Reference Summaries

Document Author Description
jumperz-agent-memory-stack @jumperz 31-piece memory architecture split across 3 phases (Core → Reliability → Intelligence). Complete prompt/spec breakdowns for write pipeline, read pipeline, decay, knowledge graph, episodic memory, trust scoring, echo/fizzle feedback loops. The foundational reference that others build on.
joelhooks-adr-0077-memory-system-next-phase @joelhooks ADR for joelclaw (personal AI Mac Mini). Maps existing production system (~6 days running, Qdrant 1,343 points) against jumperz's 31 pieces. Plans 3 increments: retrieval quality (score decay, query rewriting), storage quality (dedup, nightly maintenance), feedback loop (echo/fizzle). Includes detailed gap analysis.
coolmanns-openclaw-memory-architecture coolmanns 12-layer production memory stack for OpenClaw with 14 agents. SQLite+FTS5 knowledge graph (3,108 facts), llama.cpp GPU embeddings (768d, 7ms), three runtime plugins (continuity, stability, graph-memory). 100% recall on 60-query benchmark. Includes activation/decay system, domain RAG, session boot sequences.
drag88-agent-output-degradation @drag88 (Aswin) "Why Your Agent's Output Gets Worse Over Time" — multi-agent convergence problem. 4-tier memory (working → episodic → semantic → procedural). 3-layer enforcement pipeline (YAML regex → Gemini LLM judge → self-learning loop). Core insight: convert expensive runtime LLM checks into free static regex rules over time.
versatly-clawvault Versatly (@drag88) ClawVault npm CLI tool — structured markdown memory vault with observation pipeline, knowledge graph, session lifecycle (wake/sleep/checkpoint), task/project primitives, Obsidian integration, OpenClaw hooks. 449+ tests. v2.6.1.
vstorm-memv vstorm-co memv (PyPI: memvee) — Nemori-inspired predict-calibrate extraction + episode segmentation, plus Graphiti-style bi-temporal validity and hybrid retrieval (sqlite-vec + FTS5 + RRF) on SQLite.
supermemory Dhravya Shah / supermemoryai Supermemory memory-as-a-service API: memory versioning (linked-list chains), typed relationships (updates/extends/derives), static/dynamic profile synthesis, time-based forgetting with reason tracking, multi-model embedding storage. Critical caveat: open-source repo is frontend/SDK only; core engine is proprietary backend at api.supermemory.ai.

Paper Reference Summaries (Academic / Industry)

Document Author Description
hu-evermembench Hu et al. EverMemBench benchmark for >1M-token multi-party, multi-group interleaved conversations; diagnoses multi-hop collapse, temporal/versioning difficulty, and retrieval-bottlenecked “memory awareness”.
zhang-live-evo Zhang et al. Live-Evo: online self-evolving agent memory with an experience bank + meta-guideline bank, contrastive “memory-on vs memory-off” feedback, and weight-based reinforcement/forgetting; evaluated on Prophet Arena + deep research (as reported).
shutova-structmemeval Shutova et al. StructMemEval benchmark for whether agents can organize memory into useful structures (trees/ledgers/state tracking), not just retrieve facts; includes hint vs no-hint evaluation to isolate “structure recognition” failures.
yan-gam Yan et al. GAM: just-in-time agent memory via lightweight memos + a universal page-store, plus a deep-research researcher that plans/searches/integrates/reflects over history to compile optimized context at runtime; strong long-context QA gains with higher latency (as reported).
yang-graph-based-agent-memory-taxonomy Yang et al. Graph-based Agent Memory survey: graph-centric taxonomy + lifecycle (extract/store/retrieve/evolve), storage structures (KG/temporal/hyper/hierarchical/hybrid), retrieval operators, evolution/maintenance, and resources/benchmarks; useful shared vocabulary for shisad.
zhang-survey-memory-mechanism Zhang et al. Survey on memory mechanisms for LLM agents: definitions, why memory, design axes (sources/forms/ops), evaluation approaches, and application domains; good baseline checklist alongside newer benchmarks/systems.
hu-memory-age-ai-agents Hu et al. Memory in the Age of AI Agents survey: proposes unified lenses of forms (token/parametric/latent), functions (factual/experiential/working), and dynamics (formation/evolution/retrieval), plus benchmarks/frameworks and trustworthiness frontiers.
li-locomoplus Li et al. LoCoMo-Plus: evaluates beyond-factual “cognitive memory” (latent constraints like state/goals/values) under cue–trigger semantic disconnect, using constraint-consistency + LLM-judge evaluation.
maharana-locomo Maharana et al. LoCoMo dataset + benchmark for very long-term multi-session conversations (300 turns, multimodal) grounded in personas + temporal event graphs; evaluates QA + event summarization + multimodal generation.
wu-longmemeval Wu et al. LongMemEval benchmark + design decomposition (indexing → retrieval → reading) and system optimizations (value granularity, key expansion, time-aware query expansion).
packer-memgpt Packer et al. MemGPT: OS-inspired hierarchical memory + paging between a fixed-context LLM prompt and external stores (recall + archival), with function-call memory ops and event-driven control flow; foundational baseline for external agent memory.
chhikara-mem0 Chhikara et al. Mem0: production-oriented long-term memory pipeline with explicit ops (ADD/UPDATE/DELETE/NOOP) and an optional graph memory variant; reports quality + token/latency tradeoffs on LoCoMo.
liu-simplemem Liu et al. SimpleMem: write-time semantic structured compression + online synthesis + intent-aware retrieval planning (multi-view dense/BM25/symbolic retrieval with union+dedup) to improve LoCoMo/LongMemEval quality while cutting token cost (as reported).
xu-a-mem Xu et al. A‑Mem: Zettelkasten-inspired note network with LLM-driven link generation and “memory evolution” (updating older note attributes as new evidence arrives); strong LoCoMo multi-hop/temporal gains with far lower token lengths than full-context (as reported).
salama-meminsight Salama et al. MemInsight: autonomous memory augmentation that mines/annotates attributes (entity-centric + conversation-centric; turn/session granularity) and uses attribute-guided retrieval; large LoCoMo retrieval recall gains vs DPR RAG baseline (as reported).
rasmussen-zep Rasmussen et al. Zep: production memory layer built on Graphiti, a bi-temporal knowledge graph (episodes → entities/facts → communities) with validity intervals and invalidation-based corrections; evaluated on DMR + LongMemEval.
nan-nemori Nan et al. Nemori: cognitively-inspired self-organizing agent memory with semantic episode boundary detection + episodic narratives and a predict-calibrate loop that distills semantic knowledge from prediction gaps; strong LoCoMo + LongMemEvalS results (as reported).
li-memos Li et al. MemOS: OS-like memory control plane with MemCube (payload+metadata), lifecycle/scheduling, governance (ACL/TTL/audit), and multi-substrate memory (plaintext/activation/KV/parameter/LoRA).
yan-memory-r1 Yan et al. Memory-R1: reinforcement-learned memory manager (ADD/UPDATE/DELETE/NOOP) + answer agent with learned memory distillation; data-efficient RL (PPO/GRPO) training with exact-match reward.
jonelagadda-mnemosyne Jonelagadda et al. Mnemosyne: edge-friendly graph memory with substance/redundancy filters, probabilistic recall with decay/refresh, and a fixed-budget “core summary” for persona-level context.
patel-engram Patel et al. ENGRAM: lightweight typed memory (episodic/semantic/procedural) with simple dense retrieval + strict evidence budgets; strong LoCoMo + LongMemEval results with low token usage.
wei-evo-memory Wei et al. Evo-Memory: streaming benchmark + framework for self-evolving memory and experience reuse; introduces ExpRAG and ReMem (Think/Act/Refine) baselines and robustness/efficiency metrics.
cao-remember-me-refine-me Cao et al. ReMe: dynamic procedural memory lifecycle (acquire→reuse→refine) with multi-faceted distillation from success/failure trajectories, scenario-aware retrieval, and utility-based pruning; strong BFCL‑V3/AppWorld results (as reported).
sarin-memoria Sarin et al. Memoria: personalization memory layer combining session summaries + KG triplets (persona) with exponential recency weighting; SQLite + ChromaDB architecture and LongMemEvals subset results.
latimer-hindsight Latimer et al. Hindsight: retain/recall/reflect architecture separating evidence vs beliefs vs summaries; temporal+entity memory graph with multi-channel retrieval fusion and belief confidence updates; very strong LongMemEval/LoCoMo results (as reported).
yu-agentic-memory Yu et al. AgeMem: RL-trained unified LTM+STM controller exposing memory ops as tool actions (add/update/delete/retrieve/summarize/filter) with a 3-stage curriculum and step-wise GRPO for credit assignment.
hu-evermemos Hu et al. EverMemOS: self-organizing “memory OS” with MemCells→MemScenes lifecycle, user profile consolidation, and necessity/sufficiency-guided recollection (verifier + query rewrite); strong LoCoMo/LongMemEval results (as reported).
li-timem Li et al. TiMem: temporal-hierarchical memory consolidation (segment→session→day→week→profile) with query-complexity recall planning + gating; strong LoCoMo/LongMemEval-S accuracy with low recalled tokens (as reported).
zhang-himem Zhang et al. HiMem: hierarchical long-term memory split (Episode Memory + Note Memory) with topic+surprise episode segmentation, note-first “best-effort” retrieval w/ sufficiency checks, and conflict-aware reconsolidation; strong LoCoMo results (as reported).
behrouz-nested-learning Behrouz et al. Nested Learning / CMS / Hope: reframes memory as multi-timescale update dynamics (continuum memory blocks updated at different frequencies) with implications for consolidation and “corrections without forgetting”.
zhang-recursive-language-models Zhang et al. Recursive Language Models (RLMs): inference-time recursion + REPL state treats long prompts as an external environment; processes multi‑million-token inputs with sub-calls and programmatic slicing, often beating long-context scaffolds at comparable average cost (as reported).
wang-m-plus Wang et al. M+: latent-space long-term memory extension to MemoryLLM that stores dropped memory tokens in an LTM pool and retrieves them during generation with a co-trained retriever; extends retention to >160k tokens at similar GPU memory cost (as reported).
dong-minja Dong et al. MINJA: practical memory injection attack on “memory-as-demonstrations” agents via query-only interaction (bridging steps + progressive shortening); motivates write-time gates, isolation, and safer memory representations.
sunil-memory-poisoning-attack-defense Sunil et al. Memory poisoning attack & defense: empirical MINJA follow-up in EHR agents; shows pre-existing benign memory can reduce ASR, and that trust-score defenses can fail via over-conservatism or overconfidence.
anokhin-arigraph Anokhin et al. AriGraph: knowledge-graph world model that links episodic observation nodes to extracted semantic triplets; two-stage retrieval (semantic→episodic) for planning/exploration in text-game environments.
behrouz-titans Behrouz et al. Titans: long-context architecture with an online-updated neural memory module (test-time learning) plus persistent task memory; provides explicit primitives for surprise-based salience and forgetting.
ahn-hema Ahn HEMA: hippocampus-inspired dual memory for long conversations (running compact summary + FAISS episodic vector store) with explicit prompt budgeting, pruning (“semantic forgetting”), and summary-of-summaries consolidation.
tan-membench Tan et al. MemBench: benchmark/dataset for agent memory covering participation vs observation scenarios and factual vs reflective memory, with metrics for accuracy/recall/capacity and read/write-time efficiency.

Deep Dive Analyses

Root-level critical analyses intended for synthesis work. These reference the summaries above, but focus on coherence, evidence quality, risks, and synthesis-ready claim framing.

Synthesis Based on Focus
ANALYSIS ANALYSIS-*.md + shisad docs + Mem0/Letta baselines Cross-system comparison (techniques + memory types), plus mapping to shisad and “traditional” RAG-ish memory
ANALYSIS-academic-industry paper ANALYSIS-arxiv-*.md + shisad plan Academic/industry synthesis: benchmarks vs systems vs attacks, with “what’s missing in shisad” framing
Benchmarks best practices Public disputes, audits, our analysis Known pitfalls, metric confusion, dataset quality issues, per-benchmark limitations
MELT benchmark design ANALYSIS.md systems + Reality Check epistemic docs Memory Evaluation for Lifecycle Testing — session-replay benchmark testing full memory lifecycle (decay, consolidation, contradiction, core stability, inference) at 6 scale tiers over simulated time. Separate repo; draft.
Analysis Based on Focus
ANALYSIS-jumperz-agent-memory-stack references/jumperz-agent-memory-stack.md Checklist critique (semantics, failure modes, missing evaluation), synthesis-ready takeaways + claims table
ANALYSIS-joelhooks-adr-0077-memory-system-next-phase references/joelhooks-adr-0077-memory-system-next-phase.md Increment plan critique (decay, rewrite, dedup, echo/fizzle), validation plan + claims
ANALYSIS-coolmanns-openclaw-memory-architecture references/coolmanns-openclaw-memory-architecture.md + vendor/openclaw-memory-architecture/ Layered stack critique with benchmark-method verification, operational risks, doc drift notes
ANALYSIS-drag88-agent-output-degradation references/drag88-agent-output-degradation.md Convergence + enforcement pattern critique (judge→rule distillation), measurement gaps, risks
ANALYSIS-versatly-clawvault references/versatly-clawvault.md + vendor/clawvault/ Product/tooling critique (surface area, hooks, qmd dependency), security posture, missing benchmarks
ANALYSIS-vstorm-memv references/vstorm-memv.md + vendor/memv/ Implementation critique of Nemori-inspired predict-calibrate extraction + bi-temporal validity + hybrid retrieval, with gaps/risks and shisad mapping
ANALYSIS-openviking vendor/openviking/ + Hermes provider docs Open-source context database: viking:// filesystem, L0/L1/L2 tiered loading, session-commit extraction across 8 memory categories, and hierarchical typed retrieval over memory/resources/skills; strong observability with heavier operational complexity
ANALYSIS-byterover-cli vendor/byterover-cli/ + vendor/byterover-cli/paper/ Agent-native coding-agent memory/runtime: daemon + per-project agent pool, markdown context tree with explicit relations and lifecycle, 5-tier progressive retrieval with cache/OOD detection, and strong self-reported benchmarks with caveats
ANALYSIS-mira-OSS vendor/mira-OSS/ Full-stack event-driven agent (v1 rev 2): activity-day sigmoid decay, hub discovery + 3-axis linking (vector+entity+TF-IDF), Text-Based LoRA + user model synthesis with critic validation, background forage agent (sub-agent collaboration), portrait synthesis, 16 tools, context overflow remediation, immutable domain models, multi-user RLS + Vault; gaps in write gating, external benchmarks, taint tracking, and sub-agent capability scoping
ANALYSIS-claude-code-memory Source: /home/lhl/Downloads/claude-code/src Claude Code memory subsystem (Anthropic): first-party production-scale memory system; flat-file MEMORY.md + typed topic files (user/feedback/project/reference) + background extraction via forked agent with mutual exclusion + LLM-based relevance selection (Sonnet) + team memory with OAuth sync + auto dream consolidation + KAIROS daily-log mode + eval-validated prompts with case IDs + security-hardened path validation; no vector search, no graph, no decay scoring
ANALYSIS-codex-memory openai/codex Codex memory subsystem (OpenAI): first-party open-source coding agent; two-phase async pipeline (gpt-5.1-codex-mini extraction → gpt-5.3-codex consolidation) + SQLite-backed job coordination (leases/heartbeats/watermarks) + progressive disclosure layout (memory_summary → MEMORY.md → rollout_summaries → skills) + skills as procedural memory + usage-based citation-driven retention + thread-diff incremental forgetting + ~1,400 lines extraction/consolidation prompts; no vector search, no team memory, no real-time extraction
ANALYSIS-google-always-on-memory-agent vendor/always-on-memory-agent/ Official Google ADK sample: always-on daemon with multimodal ingestion (27 file types via Gemini 3.1 Flash-Lite), periodic LLM consolidation, SQLite storage, HTTP API + Streamlit dashboard; no retrieval/search (recency scan LIMIT 50), no decay/dedup/versioning; useful as ADK orchestration reference and multimodal ingestion pattern
ANALYSIS-supermemory references/supermemory.md + vendor/supermemory/ Memory-as-a-service startup: memory versioning (linked-list chains via parentMemoryId/rootMemoryId/isLatest), typed relationship ontology (updates/extends/derives), static/dynamic profile synthesis API, time-based forgetting with audit trail, multi-model embedding columns, MemoryBench framework; open-source repo is SDK/frontend only — core engine logic is proprietary hosted backend
ANALYSIS-karta vendor/karta/ Karta (rohithzr): Rust (~10.4K LOC) agentic memory library with Zettelkasten-inspired knowledge graph, 7-type dream engine (deduction/induction/abduction/consolidation/contradiction/episode digest/cross-episode digest) with inference feedback into retrieval, embedding-based query classification (6 modes), retroactive context evolution with drift protection, cross-encoder reranking with abstention, multi-hop BFS traversal, atomic fact decomposition with per-fact embeddings, foresight signals with TTL, structured episode digests; BEAM 100K: 57.7% with 243-failure root cause catalog

Paper Deep Dive Analyses (Academic / Industry)

Analysis Based on Focus
ANALYSIS-arxiv-2602.01313-evermembench references/hu-evermembench.md + references/papers/arxiv-2602.01313.pdf Benchmark critique emphasizing version semantics, multi-party fragmentation, oracle diagnostics, and shisad mapping
ANALYSIS-arxiv-2602.02369-live-evo references/zhang-live-evo.md + references/papers/arxiv-2602.02369.pdf System deep dive emphasizing online experience weighting from continuous feedback, meta-guidelines for memory compilation, and memory-on vs memory-off utility measurement; shisad mapping for feedback loops + procedural memory gating
ANALYSIS-arxiv-2602.11243-structmemeval references/shutova-structmemeval.md + references/papers/arxiv-2602.11243.pdf Benchmark deep dive emphasizing memory organization/structure as a distinct capability (trees/ledgers/state), hint vs no-hint diagnostics, and implications for shisad structured-memory primitives
ANALYSIS-arxiv-2602.05665-graph-based-agent-memory-taxonomy references/yang-graph-based-agent-memory-taxonomy.md + references/papers/arxiv-2602.05665.pdf Survey deep dive providing graph-based memory taxonomy and lifecycle (extract/store/retrieve/evolve), with implications for shisad graph-as-derived-view, operator choices, and maintenance jobs
ANALYSIS-arxiv-2404.13501-survey-memory-mechanism references/zhang-survey-memory-mechanism.md + references/papers/arxiv-2404.13501.pdf Survey deep dive providing baseline taxonomy and evaluation checklists for agent memory; useful coverage reference alongside newer benchmarks/systems for shisad’s roadmap
ANALYSIS-arxiv-2512.13564-memory-age-ai-agents references/hu-memory-age-ai-agents.md + references/papers/arxiv-2512.13564.pdf Survey deep dive emphasizing the Forms–Functions–Dynamics taxonomy and frontiers (RL integration, multimodal, multi-agent shared memory, trustworthiness), used as organizing frame for shisad v0.7 memory roadmap
ANALYSIS-arxiv-2402.17753-locomo references/maharana-locomo.md + references/papers/arxiv-2402.17753.pdf Dataset/benchmark critique with episodic-memory implications (event graphs, multimodal, RAG harm) and shisad mapping
ANALYSIS-arxiv-2410.10813-longmemeval references/wu-longmemeval.md + references/papers/arxiv-2410.10813.pdf Benchmark and system-design decomposition (indexing/retrieval/reading), with mapping to shisad primitives
ANALYSIS-arxiv-2310.08560-memgpt references/packer-memgpt.md + references/papers/arxiv-2310.08560.pdf System deep dive emphasizing virtual context management (OS paging), memory tiers (working/queue/recall/archival), function-call memory ops, and implications for shisad versioned corrections + write-policy hardening
ANALYSIS-arxiv-2602.10715-locomoplus references/li-locomoplus.md + references/papers/arxiv-2602.10715.pdf Beyond-factual “cognitive memory” benchmark critique (latent constraints) and implications for safe constraint/procedural memory
ANALYSIS-arxiv-2504.19413-mem0 references/chhikara-mem0.md + references/papers/arxiv-2504.19413.pdf System deep dive emphasizing explicit memory ops, graph-memory tradeoffs, deployment metrics (tokens/p95), and shisad mapping (versioned corrections vs delete)
ANALYSIS-arxiv-2601.02553-simplemem references/liu-simplemem.md + references/papers/arxiv-2601.02553.pdf System deep dive emphasizing write-time semantic structured compression, online consolidation, and intent-aware multi-view retrieval planning; mapping to shisad “derived vs raw” memory + retrieval budgeting
ANALYSIS-arxiv-2502.12110-a-mem references/xu-a-mem.md + references/papers/arxiv-2502.12110.pdf System deep dive emphasizing Zettelkasten-style notes + LLM-driven linking + memory evolution, with strong multi-hop/temporal LoCoMo gains but high versioning/audit requirements for shisad
ANALYSIS-arxiv-2503.21760-meminsight references/salama-meminsight.md + references/papers/arxiv-2503.21760.pdf System deep dive emphasizing autonomous attribute mining/annotation as a derived metadata layer to improve retrieval recall and downstream tasks; mapping to shisad schema constraints + provenance/versioning
ANALYSIS-arxiv-2511.18423-gam references/yan-gam.md + references/papers/arxiv-2511.18423.pdf System deep dive emphasizing just-in-time context compilation via memo index + universal page-store and an iterative deep-research researcher; highlights the latency/quality trade-off and mapping to shisad evidence-first episodic storage
ANALYSIS-arxiv-2501.13956-zep references/rasmussen-zep.md + references/papers/arxiv-2501.13956.pdf System deep dive emphasizing bi-temporal validity semantics, episodic+semantic+community graph tiers, hybrid retrieval (BM25/embeddings/BFS), and implications for shisad versioned memory
ANALYSIS-arxiv-2507.03724-memos references/li-memos.md + references/papers/arxiv-2507.03724.pdf System deep dive emphasizing MemCube metadata, multi-substrate memory (plaintext/KV/parameter), lifecycle/scheduling/governance, and mapping to shisad primitives
ANALYSIS-arxiv-2508.19828-memory-r1 references/yan-memory-r1.md + references/papers/arxiv-2508.19828.pdf RL deep dive emphasizing learned memory ops (ADD/UPDATE/DELETE/NOOP) + post-retrieval memory distillation, reward design, and what’s required to safely adopt this in shisad
ANALYSIS-arxiv-2508.03341-nemori references/nan-nemori.md + references/papers/arxiv-2508.03341.pdf System deep dive emphasizing episode segmentation (Two-Step Alignment) + predict-calibrate semantic distillation, reported LoCoMo/LongMemEvalS gains, and implications for shisad write gating + correction semantics
ANALYSIS-arxiv-2510.08601-mnemosyne references/jonelagadda-mnemosyne.md + references/papers/arxiv-2510.08601.pdf System deep dive emphasizing edge-first graph memory, redundancy/refresh, probabilistic decay-based recall, and a fixed-budget core/persona summary; includes evaluation-rigor cautions
ANALYSIS-arxiv-2511.12960-engram references/patel-engram.md + references/papers/arxiv-2511.12960.pdf System deep dive emphasizing typed memory (episodic/semantic/procedural), deterministic routing/formatting, strict evidence budgets, and strong token/latency results; mapping to shisad primitives
ANALYSIS-arxiv-2511.20857-evo-memory references/wei-evo-memory.md + references/papers/arxiv-2511.20857.pdf Benchmark deep dive emphasizing streaming task-sequence evaluation for experience reuse, plus refine/prune mechanisms and metrics (robustness, step efficiency) for shisad’s eval harness
ANALYSIS-arxiv-2512.10696-remember-me-refine-me references/cao-remember-me-refine-me.md + references/papers/arxiv-2512.10696.pdf System deep dive emphasizing procedural memory distillation + scenario-aware reuse + utility-based refinement/pruning; mapping to shisad procedural tier + versioned invalidation vs delete
ANALYSIS-arxiv-2512.12686-memoria references/sarin-memoria.md + references/papers/arxiv-2512.12686.pdf System deep dive emphasizing persona KG + session summaries with recency-weighted retrieval; highlights missing governance/versioning primitives needed for shisad
ANALYSIS-arxiv-2512.12818-hindsight references/latimer-hindsight.md + references/papers/arxiv-2512.12818.pdf System deep dive emphasizing retain/recall/reflect with four-network memory (facts/experiences/observations/beliefs), token-budgeted multi-channel retrieval fusion, and belief confidence updates; key shisad mapping
ANALYSIS-arxiv-2601.01885-agentic-memory references/yu-agentic-memory.md + references/papers/arxiv-2601.01885.pdf RL deep dive emphasizing unified LTM+STM memory ops as tool actions, 3-stage training curriculum, step-wise GRPO credit assignment, and implications for shisad’s future learned memory policies
ANALYSIS-arxiv-2601.02163-evermemos references/hu-evermemos.md + references/papers/arxiv-2601.02163.pdf System deep dive emphasizing MemCell→MemScene consolidation lifecycle, user profile/foresight, and sufficiency-verified scene-guided retrieval; mapping to shisad consolidation roadmap
ANALYSIS-arxiv-2601.02845-timem references/li-timem.md + references/papers/arxiv-2601.02845.pdf System deep dive emphasizing temporal-hierarchical consolidation (TMT), query-complexity recall planning/gating, and the accuracy–token frontier; mapping to shisad temporal tiers
ANALYSIS-arxiv-2601.06377-himem references/zhang-himem.md + references/papers/arxiv-2601.06377.pdf System deep dive emphasizing Episode Memory + Note Memory hierarchy, note-first “best-effort” retrieval w/ sufficiency checks, and conflict-aware reconsolidation; mapping to shisad event→knowledge tiers + versioned updates
ANALYSIS-arxiv-2512.24695-nested-learning references/behrouz-nested-learning.md + references/papers/arxiv-2512.24695.pdf Conceptual deep dive on multi-timescale “continuum memory” and consolidation dynamics; mapping to shisad tiered memory + versioned corrections
ANALYSIS-arxiv-2512.24601-recursive-language-models references/zhang-recursive-language-models.md + references/papers/arxiv-2512.24601.pdf Architecture deep dive emphasizing RLM-style programmatic reading/compilation over arbitrarily long evidence stores (REPL + recursion + sub-calls), with implications for shisad sandboxed compilation traces and cost tail management
ANALYSIS-arxiv-2502.00592-m-plus references/wang-m-plus.md + references/papers/arxiv-2502.00592.pdf Architecture deep dive emphasizing latent-space long-term memory tokens + co-trained retrieval for >160k retention, with mapping to shisad’s external evidence-first memory and retrieval diagnostics
ANALYSIS-arxiv-2503.03704-minja references/dong-minja.md + references/papers/arxiv-2503.03704.pdf Security deep dive on query-only memory injection attacks; implications for write-policy, provenance/taint, isolation, and “don’t store demonstrations” patterns
ANALYSIS-arxiv-2601.05504-memory-poisoning-attack-defense references/sunil-memory-poisoning-attack-defense.md + references/papers/arxiv-2601.05504.pdf Security deep dive emphasizing ISR vs ASR under realistic memory conditions, and why trust-score sanitization can fail; concrete shisad hardening takeaways
ANALYSIS-arxiv-2407.04363-arigraph references/anokhin-arigraph.md + references/papers/arxiv-2407.04363.pdf System deep dive emphasizing episodic↔semantic memory linking, graph-structured retrieval for planning/exploration, and implications for shisad episode objects + provenance + correction semantics
ANALYSIS-arxiv-2501.00663-titans references/behrouz-titans.md + references/papers/arxiv-2501.00663.pdf Architecture deep dive emphasizing test-time-learning neural memory (surprise/momentum/forgetting), Titans MAC/MAG/MAL variants, and how to translate salience/decay ideas into shisad’s external memory framework
ANALYSIS-arxiv-2504.16754-hema references/ahn-hema.md + references/papers/arxiv-2504.16754.pdf System deep dive emphasizing dual memory (summary + vector store), explicit prompt budgeting, pruning/consolidation policies, and evaluation-rigor cautions for shisad adoption
ANALYSIS-arxiv-2506.21605-membench references/tan-membench.md + references/papers/arxiv-2506.21605.pdf Benchmark deep dive emphasizing multi-scenario (participant vs observer) and multi-level (factual vs reflective) evaluation, plus latency/capacity metrics and implications for shisad eval harnesses

Source Threads & Links

Source URL
@jumperz memory stack thread https://x.com/jumperz/status/2024841165774717031
@joelhooks ADR tweet https://x.com/joelhooks/status/2024947701738262773
joelclaw ADR-0077 https://joelclaw.com/adrs/0077-memory-system-next-phase
@drag88 article https://x.com/drag88/status/2022551759491862974
supermemory docs https://supermemory.ai/docs
supermemory repo https://github.com/supermemoryai/supermemory
mempalace repo https://github.com/milla-jovovich/mempalace
karta repo https://github.com/rohithzr/karta

File Tree

agentic-memory/
├── README.md                          ← this file
├── ANALYSIS.md                         ← synthesis + comparison
├── ANALYSIS-academic-industry.md       ← academic/industry synthesis
├── ANALYSIS-jumperz-agent-memory-stack.md
├── ANALYSIS-joelhooks-adr-0077-memory-system-next-phase.md
├── ANALYSIS-coolmanns-openclaw-memory-architecture.md
├── ANALYSIS-drag88-agent-output-degradation.md
├── ANALYSIS-versatly-clawvault.md
├── ANALYSIS-vstorm-memv.md
├── ANALYSIS-mira-OSS.md
├── ANALYSIS-codex-memory.md
├── ANALYSIS-google-always-on-memory-agent.md
├── ANALYSIS-supermemory.md
├── ANALYSIS-karta.md               ← Karta: Rust agentic memory library with dream engine
├── ANALYSIS-mempalace.md           ← not in ANALYSIS.md (claims-vs-code issues); see REVIEWED.md
├── REVIEWED.md                        ← triage log (examined but not promoted to ANALYSIS)
├── PUNCHLIST-academic-industry.md     ← tracking checklist for paper deep dives
├── templates/                         ← templates for paper analyses/summaries
│
├── references/                        ← summarized reference docs (markdown w/ frontmatter)
│   ├── 1-full-agent-memory-build.jpg  ← jumperz card 1: memory storage
│   ├── 2-feeds-into.jpg               ← jumperz card 2: memory intelligence
│   ├── jumperz-agent-memory-stack.md
│   ├── joelhooks-adr-0077-memory-system-next-phase.md
│   ├── coolmanns-openclaw-memory-architecture.md
│   ├── drag88-agent-output-degradation.md
│   └── versatly-clawvault.md
│   ├── hu-evermembench.md
│   ├── li-locomoplus.md
│   ├── maharana-locomo.md
│   ├── wu-longmemeval.md
│   ├── chhikara-mem0.md
│   └── papers/                        ← archived PDFs + text snapshots
│       ├── README.md
│       ├── arxiv-*.pdf
│       └── arxiv-*.md
│
└── vendor/                            ← cloned source repos
    ├── mira-OSS/                      ← github.com/taylorsatula/mira-OSS (snapshot, AGPLv3)
    │   ├── README.md
    │   ├── CLAUDE.md                  ← project guide (architecture, patterns, principles)
    │   ├── main.py                    ← FastAPI entry point
    │   ├── cns/                       ← Central Nervous System (conversation orchestration)
    │   │   ├── api/                   ← FastAPI endpoints (chat, actions, data, health)
    │   │   ├── core/                  ← Domain models (Continuum, Message, Events)
    │   │   ├── services/              ← Orchestrator, subcortical, summary, collapse handler
    │   │   └── infrastructure/        ← Repositories, Valkey cache, unit of work
    │   ├── lt_memory/                 ← Long-term memory system
    │   │   ├── scoring_formula.sql    ← Multi-factor activity-day sigmoid importance scoring
    │   │   ├── models.py             ← Memory, Entity, ExtractedMemory, link types
    │   │   ├── hybrid_search.py      ← BM25 + pgvector with RRF
    │   │   ├── proactive.py          ← Dual-path retrieval (similarity + hub discovery)
    │   │   ├── hub_discovery.py      ← Entity-driven memory retrieval via pg_trgm
    │   │   └── processing/           ← Extraction, consolidation, entity GC pipelines
    │   ├── working_memory/           ← System prompt composition via trinkets
    │   ├── tools/                    ← Self-registering tool framework (11 built-in)
    │   ├── config/                   ← Pydantic config + prompt templates
    │   └── auth/                     ← WebAuthn + magic link authentication
    │
    ├── openclaw-memory-architecture/  ← github.com/coolmanns/openclaw-memory-architecture
    │   ├── README.md
    │   ├── PROJECT.md
    │   ├── CHANGELOG.md
    │   ├── docs/
    │   │   ├── ARCHITECTURE.md        ← full 12-layer technical reference
    │   │   ├── knowledge-graph.md     ← graph search pipeline, benchmarks
    │   │   ├── context-optimization.md
    │   │   ├── embedding-setup.md
    │   │   ├── benchmark-process.md
    │   │   ├── benchmark-results.md
    │   │   ├── code-search.md
    │   │   └── COMPARISON.md
    │   ├── schema/
    │   │   └── facts.sql              ← SQLite schema for knowledge graph
    │   ├── scripts/                   ← init, seed, search, ingest, decay, benchmark, telemetry
    │   ├── templates/                 ← starter files (active-context, gating-policies, etc.)
    │   └── plugin-graph-memory/       ← OpenClaw plugin (JS)
    │
    ├── karta/                         ← github.com/rohithzr/karta (submodule, MIT)
    │   ├── Cargo.toml                ← workspace: karta-core + karta-cli
    │   ├── crates/
    │   │   └── karta-core/           ← Core engine (~6.7K LOC Rust)
    │   │       ├── src/
    │   │       │   ├── note.rs       ← MemoryNote, Provenance, NoteStatus, AtomicFact, Episode, EpisodeDigest
    │   │       │   ├── write.rs      ← Write path: index, link, evolve, foresight, facts
    │   │       │   ├── read.rs       ← Read path: classify, search, traverse, rerank, synthesize
    │   │       │   ├── rerank.rs     ← Jina/LLM/noop rerankers
    │   │       │   ├── dream/        ← Dream engine: 7 inference types
    │   │       │   ├── store/        ← LanceDB + SQLite implementations
    │   │       │   └── llm/          ← Provider trait + OpenAI + mock + prompts
    │   │       └── tests/            ← eval, beam_100k, bench_beam (~3.8K LOC)
    │   ├── findings.md               ← BEAM 100K detailed failure analysis
    │   └── plan.md                   ← Experiment plan targeting 90%+
    │
    ├── always-on-memory-agent/        ← GoogleCloudPlatform/generative-ai (official ADK sample)
    │   ├── agent.py                  ← ADK multi-agent daemon (ingest/consolidate/query)
    │   ├── dashboard.py              ← Streamlit UI
    │   └── docs/                     ← Logo/architecture assets
    │
    ├── memv/                          ← github.com/vstorm-co/memv
    │   ├── README.md
    │   ├── CHANGELOG.md
    │   ├── pyproject.toml             ← PyPI: memvee, v0.1.0
    │   ├── docs/                      ← docs site (MkDocs)
    │   ├── src/
    │   │   └── memv/                  ← segmentation, extraction, validity, retrieval, storage
    │   └── tests/
    │
    ├── supermemory/                    ← github.com/supermemoryai/supermemory (lean subset: schemas, SDK, MCP, arch docs)
    │   ├── LICENSE
    │   ├── README.md                  ← provenance + open-source vs hosted-backend split
    │   ├── packages/
    │   │   ├── validation/            ← Zod schemas (data model definitions)
    │   │   │   ├── schemas.ts
    │   │   │   └── api.ts
    │   │   ├── lib/
    │   │   │   ├── api.ts             ← reveals backend dependency (api.supermemory.ai)
    │   │   │   └── similarity.ts      ← client-side cosine sim (visualization only)
    │   │   └── tools/src/shared/
    │   │       └── memory-client.ts   ← SDK client (profile search, prompt formatting)
    │   ├── apps/mcp/src/
    │   │   └── server.ts              ← MCP server (memory/recall/whoAmI tools)
    │   └── skills/supermemory/references/
    │       └── architecture.md        ← claimed design (558 lines)
    │
    └── clawvault/                     ← github.com/Versatly/clawvault
        ├── README.md
        ├── PLAN.md                    ← issue #4: ledger, reflect, replay, archive
        ├── CHANGELOG.md
        ├── SKILL.md
        ├── package.json               ← npm: clawvault, v2.6.1
        ├── src/
        │   ├── commands/              ← archive, context, inject, observe, reflect, replay, wake, sleep, task, project, ...
        │   ├── observer/              ← compressor, reflector, router, session-watcher
        │   ├── lib/                   ← vault, memory-graph, ledger, observation-format, session-utils
        │   └── cli/
        ├── bin/                       ← CLI entry + command registration modules
        ├── hooks/                     ← OpenClaw hook handler
        ├── dashboard/                 ← web dashboard (vault parser, graph diff)
        ├── schemas/
        ├── scripts/
        ├── templates/
        └── tests/

Key Themes Across Sources

  • Phased build order matters: Core memory first (write/read/decay), reliability second (dedup/maintenance/recovery), intelligence last (graphs/trust/cross-agent). Building out of order amplifies flaws.
  • Tiered retrieval: Summary files first (fast, cheap), vector search fallback (thorough, expensive). Don't vector-search everything.
  • Score decay: final_score = relevance × exp(-λ × days) — recency-weighted relevance is universal across all architectures.
  • Feedback loops: Echo/fizzle (track which injected memories get used), behavior loops (extract corrections as lessons), learning loops (convert expensive LLM checks into cheap static rules).
  • SQLite over hosted vector DBs: At current scales (1K-5K entries), SQLite + FTS5 + local embeddings outperforms hosted solutions on latency, cost, and operational simplicity.
  • Multi-agent convergence: Shared memory creates homogenization pressure. Workspace isolation + file routing guards help but don't fully solve it.
  • Vault index pattern: Single scannable manifest (one-line descriptions) → load individual entries on demand. One file read instead of N.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages