Skip to content

fix: guard extract_memory_structure HSET and gate semantic dedup on setting#184

Open
Piotr1215 wants to merge 8 commits intoredis:mainfrom
Piotr1215:fix/extract-memory-structure-orphan-race
Open

fix: guard extract_memory_structure HSET and gate semantic dedup on setting#184
Piotr1215 wants to merge 8 commits intoredis:mainfrom
Piotr1215:fix/extract-memory-structure-orphan-race

Conversation

@Piotr1215
Copy link
Contributor

@Piotr1215 Piotr1215 commented Mar 6, 2026

Problem

Two related bugs that cause orphaned Redis keys and unexpected semantic deduplication behavior.

Bug 1: Orphaned keys from extract_memory_structure race condition

extract_memory_structure (background task) does a raw HSET on the memory key to update topics/entities. If the memory was deleted between when the task was scheduled and when it executes (e.g., by semantic deduplication merging it into another memory), the HSET recreates the key as an orphaned hash with only topics and entities fields — no text, id_, memory_type, or vector embedding.

These orphaned keys:

  • Are returned by FT.SEARCH queries (they match the index prefix)
  • Have no usable content (empty text, empty id)
  • Cannot be deleted via the delete_long_term_memories API (no id to reference)
  • Accumulate over time as dedup/compaction runs

Reproduction: Create memories that trigger semantic deduplication. The deleted duplicates' pending extract_memory_structure tasks will recreate their keys as orphans.

Evidence from production: Found 19 orphaned keys in Redis — all had only topics and entities fields, confirmed via HKEYS.

Bug 2: Semantic dedup during indexing ignores compact_semantic_duplicates setting

index_long_term_memories calls deduplicate_by_semantic_search unconditionally when deduplicate=True. The compact_semantic_duplicates setting (wired to COMPACT_SEMANTIC_DUPLICATES env var) only gates periodic compaction in compact_long_term_memories, not this indexing-time path.

Setting COMPACT_SEMANTIC_DUPLICATES=false gives a false sense of safety — semantic dedup still runs on every memory write.

Fix

  1. Atomic Lua guard in extract_memory_structure — a Lua script performs EXISTS + HSET atomically so there is no TOCTOU window where a concurrent delete could slip between the check and the write.

  2. Gate semantic dedup in index_long_term_memories on settings.compact_semantic_duplicates — respect the setting consistently across both code paths.

Testing

  • test_extract_memory_structure — verifies the Lua script is called with correct key and pipe-separated values
  • test_extract_memory_structure_skips_deleted_key — verifies no update when Lua returns 0 (key gone)
  • test_compact_semantic_duplicates_env_var — verifies env var parsing
  • test_index_skips_semantic_dedup_when_disabled — verifies deduplicate_by_semantic_search is not called when setting is False

Note

Medium Risk
Touches Redis write semantics and deduplication behavior during indexing/compaction, which can affect data integrity and memory merge behavior if misconfigured. Changes are localized and backed by new regression tests.

Overview
Prevents a race where background extract_memory_structure could recreate deleted long-term memory keys as orphaned Redis hashes by switching the update to HSETEX with FXX (only update existing hashes) and keeping TTL, logging when updates are skipped.

Introduces a compact_semantic_duplicates setting (env-driven) and makes semantic deduplication respect it consistently, including the indexing-time path (not just periodic compaction). Adds regression tests for both behaviors; the example notebook changes are formatting/cleanup only.

Written by Cursor Bugbot for commit 41e8395. This will update automatically on new commits. Configure here.

Copilot AI review requested due to automatic review settings March 6, 2026 20:08
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes two Redis-backed long-term memory issues: (1) background extraction recreating deleted memory hashes (orphan keys), and (2) indexing-time semantic deduplication ignoring the COMPACT_SEMANTIC_DUPLICATES / compact_semantic_duplicates setting.

Changes:

  • Add an existence guard before HSET in extract_memory_structure to avoid updating deleted memory keys.
  • Make compact_long_term_memories default semantic dedup behavior follow settings.compact_semantic_duplicates.
  • Gate indexing-time semantic deduplication on settings.compact_semantic_duplicates, and add a config+test for the env var.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
agent_memory_server/long_term_memory.py Adds Redis key-existence guard for topic/entity updates; aligns semantic dedup behavior with settings in both compaction and indexing paths.
agent_memory_server/config.py Introduces compact_semantic_duplicates setting with default True.
tests/test_long_term_memory.py Adds a regression test for env var parsing; minor assert formatting changes.
Comments suppressed due to low confidence (1)

agent_memory_server/long_term_memory.py:1039

  • The new semantic-dedup gate (settings.compact_semantic_duplicates) in the indexing path isn’t exercised by tests. Consider adding a unit test that sets the setting/env var to false, calls index_long_term_memories(..., deduplicate=True), and asserts deduplicate_by_semantic_search is not invoked (e.g., via patching/spying).
            # Check for semantic duplicates (respects compact_semantic_duplicates setting)
            if not was_deduplicated and settings.compact_semantic_duplicates:
                deduped_memory, was_merged = await deduplicate_by_semantic_search(
                    memory=current_memory,
                    redis_client=redis,
                    vector_distance_threshold=vector_distance_threshold,
                )

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +526 to +532
def test_compact_semantic_duplicates_env_var(self, monkeypatch):
"""Regression: COMPACT_SEMANTIC_DUPLICATES env var must affect runtime behavior."""
monkeypatch.setenv("COMPACT_SEMANTIC_DUPLICATES", "false")
from agent_memory_server.config import Settings

s = Settings()
assert s.compact_semantic_duplicates is False
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test only asserts that Settings() parses COMPACT_SEMANTIC_DUPLICATES=false; it doesn’t verify any runtime behavior that depends on the module-level settings instance (e.g., the indexing-time semantic dedup gate). Either adjust the docstring to match what’s being tested, or extend the test to cover the actual behavior change.

Copilot uses AI. Check for mistakes.
@vishal-bala vishal-bala self-requested a review March 13, 2026 09:37
@Piotr1215 Piotr1215 force-pushed the fix/extract-memory-structure-orphan-race branch 2 times, most recently from 00f7c70 to ee17d9c Compare March 13, 2026 12:55
@Piotr1215
Copy link
Contributor Author

Piotr1215 commented Mar 15, 2026

hey @vishal-bala, swapped the lua script for HSETEX with FXX as suggested. fields are always present on valid hashes (set at creation time), so FXX skips the update cleanly when the key's been deleted by dedup.

the 39 CI test failures are OPENAI_API_KEY 401s, GitHub doesn't expose secrets to fork PRs. same tests pass on main.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

@vishal-bala vishal-bala self-requested a review March 17, 2026 13:13
@vishal-bala
Copy link
Contributor

It looks like you have some conflicts with recently merged PRs - could you resolve those? We should be good to go once that's done!

the compact_semantic_duplicates parameter in compact_long_term_memories()
was hardcoded to True with no corresponding Settings field. setting
COMPACT_SEMANTIC_DUPLICATES=false as an env var had no effect because
pydantic-settings never read it. add the field to Settings so the
docket perpetual scheduler respects the env var.
the default was evaluated at function definition time, meaning
test-time patches and runtime config reloads had no effect.
use None sentinel with runtime resolution instead.

add regression test for COMPACT_SEMANTIC_DUPLICATES env var parsing.
make format uses a different ruff version than the pre-commit hook
(v0.3.2). re-ran with pre-commit run --all-files to match CI.
…etting

two bugs:

1. extract_memory_structure does a raw HSET on the memory key without
   checking if it still exists. when semantic dedup (or compaction)
   deletes a merged memory between scheduling and execution of the
   background extraction task, the HSET recreates the key as an orphaned
   hash with only topics/entities fields — no text, id, type, or vector.
   these ghosts pollute search results and can't be deleted via the API.

   fix: check redis.exists() before HSET.

2. index_long_term_memories runs deduplicate_by_semantic_search
   unconditionally when deduplicate=True, ignoring the
   compact_semantic_duplicates setting. this means
   COMPACT_SEMANTIC_DUPLICATES=false only disables periodic compaction
   but not dedup-on-write, which is surprising.

   fix: gate semantic dedup on settings.compact_semantic_duplicates.
the exists()+hset() pattern had a TOCTOU race window where semantic
deduplication could delete the key between the two calls, recreating
the orphan. a lua script performs both operations atomically.

also adds tests for the deleted-key guard and the indexing-time
semantic dedup gate (compact_semantic_duplicates=false).

addresses review feedback from cursor bugbot and copilot.
per reviewer feedback, avoid server-side lua scripts in production.
HSETEX with FXX achieves the same atomic guard — fields only update
if they already exist on the hash, so deleted keys are never recreated.
@Piotr1215 Piotr1215 force-pushed the fix/extract-memory-structure-orphan-race branch from 88575d1 to 41e8395 Compare March 18, 2026 20:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants