fix: guard extract_memory_structure HSET and gate semantic dedup on setting#184
fix: guard extract_memory_structure HSET and gate semantic dedup on setting#184Piotr1215 wants to merge 8 commits intoredis:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Fixes two Redis-backed long-term memory issues: (1) background extraction recreating deleted memory hashes (orphan keys), and (2) indexing-time semantic deduplication ignoring the COMPACT_SEMANTIC_DUPLICATES / compact_semantic_duplicates setting.
Changes:
- Add an existence guard before
HSETinextract_memory_structureto avoid updating deleted memory keys. - Make
compact_long_term_memoriesdefault semantic dedup behavior followsettings.compact_semantic_duplicates. - Gate indexing-time semantic deduplication on
settings.compact_semantic_duplicates, and add a config+test for the env var.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
agent_memory_server/long_term_memory.py |
Adds Redis key-existence guard for topic/entity updates; aligns semantic dedup behavior with settings in both compaction and indexing paths. |
agent_memory_server/config.py |
Introduces compact_semantic_duplicates setting with default True. |
tests/test_long_term_memory.py |
Adds a regression test for env var parsing; minor assert formatting changes. |
Comments suppressed due to low confidence (1)
agent_memory_server/long_term_memory.py:1039
- The new semantic-dedup gate (
settings.compact_semantic_duplicates) in the indexing path isn’t exercised by tests. Consider adding a unit test that sets the setting/env var to false, callsindex_long_term_memories(..., deduplicate=True), and assertsdeduplicate_by_semantic_searchis not invoked (e.g., via patching/spying).
# Check for semantic duplicates (respects compact_semantic_duplicates setting)
if not was_deduplicated and settings.compact_semantic_duplicates:
deduped_memory, was_merged = await deduplicate_by_semantic_search(
memory=current_memory,
redis_client=redis,
vector_distance_threshold=vector_distance_threshold,
)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def test_compact_semantic_duplicates_env_var(self, monkeypatch): | ||
| """Regression: COMPACT_SEMANTIC_DUPLICATES env var must affect runtime behavior.""" | ||
| monkeypatch.setenv("COMPACT_SEMANTIC_DUPLICATES", "false") | ||
| from agent_memory_server.config import Settings | ||
|
|
||
| s = Settings() | ||
| assert s.compact_semantic_duplicates is False |
There was a problem hiding this comment.
This test only asserts that Settings() parses COMPACT_SEMANTIC_DUPLICATES=false; it doesn’t verify any runtime behavior that depends on the module-level settings instance (e.g., the indexing-time semantic dedup gate). Either adjust the docstring to match what’s being tested, or extend the test to cover the actual behavior change.
00f7c70 to
ee17d9c
Compare
|
hey @vishal-bala, swapped the lua script for HSETEX with FXX as suggested. fields are always present on valid hashes (set at creation time), so FXX skips the update cleanly when the key's been deleted by dedup. the 39 CI test failures are |
|
It looks like you have some conflicts with recently merged PRs - could you resolve those? We should be good to go once that's done! |
the compact_semantic_duplicates parameter in compact_long_term_memories() was hardcoded to True with no corresponding Settings field. setting COMPACT_SEMANTIC_DUPLICATES=false as an env var had no effect because pydantic-settings never read it. add the field to Settings so the docket perpetual scheduler respects the env var.
the default was evaluated at function definition time, meaning test-time patches and runtime config reloads had no effect. use None sentinel with runtime resolution instead. add regression test for COMPACT_SEMANTIC_DUPLICATES env var parsing.
make format uses a different ruff version than the pre-commit hook (v0.3.2). re-ran with pre-commit run --all-files to match CI.
…etting two bugs: 1. extract_memory_structure does a raw HSET on the memory key without checking if it still exists. when semantic dedup (or compaction) deletes a merged memory between scheduling and execution of the background extraction task, the HSET recreates the key as an orphaned hash with only topics/entities fields — no text, id, type, or vector. these ghosts pollute search results and can't be deleted via the API. fix: check redis.exists() before HSET. 2. index_long_term_memories runs deduplicate_by_semantic_search unconditionally when deduplicate=True, ignoring the compact_semantic_duplicates setting. this means COMPACT_SEMANTIC_DUPLICATES=false only disables periodic compaction but not dedup-on-write, which is surprising. fix: gate semantic dedup on settings.compact_semantic_duplicates.
the exists()+hset() pattern had a TOCTOU race window where semantic deduplication could delete the key between the two calls, recreating the orphan. a lua script performs both operations atomically. also adds tests for the deleted-key guard and the indexing-time semantic dedup gate (compact_semantic_duplicates=false). addresses review feedback from cursor bugbot and copilot.
per reviewer feedback, avoid server-side lua scripts in production. HSETEX with FXX achieves the same atomic guard — fields only update if they already exist on the hash, so deleted keys are never recreated.
88575d1 to
41e8395
Compare

Problem
Two related bugs that cause orphaned Redis keys and unexpected semantic deduplication behavior.
Bug 1: Orphaned keys from extract_memory_structure race condition
extract_memory_structure(background task) does a rawHSETon the memory key to update topics/entities. If the memory was deleted between when the task was scheduled and when it executes (e.g., by semantic deduplication merging it into another memory), theHSETrecreates the key as an orphaned hash with onlytopicsandentitiesfields — notext,id_,memory_type, or vector embedding.These orphaned keys:
FT.SEARCHqueries (they match the index prefix)delete_long_term_memoriesAPI (no id to reference)Reproduction: Create memories that trigger semantic deduplication. The deleted duplicates' pending
extract_memory_structuretasks will recreate their keys as orphans.Evidence from production: Found 19 orphaned keys in Redis — all had only
topicsandentitiesfields, confirmed viaHKEYS.Bug 2: Semantic dedup during indexing ignores compact_semantic_duplicates setting
index_long_term_memoriescallsdeduplicate_by_semantic_searchunconditionally whendeduplicate=True. Thecompact_semantic_duplicatessetting (wired toCOMPACT_SEMANTIC_DUPLICATESenv var) only gates periodic compaction incompact_long_term_memories, not this indexing-time path.Setting
COMPACT_SEMANTIC_DUPLICATES=falsegives a false sense of safety — semantic dedup still runs on every memory write.Fix
Atomic Lua guard in
extract_memory_structure— a Lua script performsEXISTS+HSETatomically so there is no TOCTOU window where a concurrent delete could slip between the check and the write.Gate semantic dedup in
index_long_term_memoriesonsettings.compact_semantic_duplicates— respect the setting consistently across both code paths.Testing
test_extract_memory_structure— verifies the Lua script is called with correct key and pipe-separated valuestest_extract_memory_structure_skips_deleted_key— verifies no update when Lua returns 0 (key gone)test_compact_semantic_duplicates_env_var— verifies env var parsingtest_index_skips_semantic_dedup_when_disabled— verifiesdeduplicate_by_semantic_searchis not called when setting is FalseNote
Medium Risk
Touches Redis write semantics and deduplication behavior during indexing/compaction, which can affect data integrity and memory merge behavior if misconfigured. Changes are localized and backed by new regression tests.
Overview
Prevents a race where background
extract_memory_structurecould recreate deleted long-term memory keys as orphaned Redis hashes by switching the update toHSETEXwithFXX(only update existing hashes) and keeping TTL, logging when updates are skipped.Introduces a
compact_semantic_duplicatessetting (env-driven) and makes semantic deduplication respect it consistently, including the indexing-time path (not just periodic compaction). Adds regression tests for both behaviors; the example notebook changes are formatting/cleanup only.Written by Cursor Bugbot for commit 41e8395. This will update automatically on new commits. Configure here.