fix: remove agent from running set on AgentSessionNotFound by btucker · Pull Request #2330 · btucker/midtown

btucker · 2026-04-05T16:38:01Z

Summary

When a session resume fails with "No conversation found", AgentSessionNotFound cleared session_id but left the agent in the running set. On daemon restart (event replay), these agents reconstructed as "running" and the daemon tried to interact with them in an infinite crash loop.

Observed: btucker-reviewer had 4 "running" records with stale session IDs, each cycling through resume → "No conversation found" → clear session_id → resume again, spamming the daemon log every few seconds.

Fix: AgentSessionNotFound now also removes the agent from the running set, matching the behavior of AgentStopped.

Test plan

Existing session_not_found_clears_session_id_on_replay test passes
Full test suite: 1536 lib tests, 0 failures
cargo clippy clean

🤖 Generated with Claude Code

When a session resume fails with "No conversation found", the daemon emits AgentSessionNotFound which cleared session_id but left the agent in the running set. On event replay (daemon restart), these agents reconstructed as "running" and the daemon tried to interact with them in an infinite loop. Now AgentSessionNotFound also removes the agent from the running set, so stale agents are properly marked as stopped. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

codecov · 2026-04-05T16:41:10Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 63.68%. Comparing base (05897b0) to head (d11b83e).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2330      +/-   ##
==========================================
- Coverage   63.70%   63.68%   -0.02%     
==========================================
  Files         100      100              
  Lines       38582    38585       +3     
==========================================
- Hits        24578    24574       -4     
- Misses      14004    14011       +7

Files with missing lines	Coverage Δ
src/daemon_v2/projections/agents.rs	`98.63% <100.00%> (+0.02%)`	⬆️

... and 4 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

btucker

Reviewed: fix/session-not-found-running

The fix is correct. AgentSessionNotFound was clearing session_id but leaving the agent in the running set, which meant on event replay the daemon would reconstruct these agents as "running" and loop trying to interact with a gone session. Adding self.running.remove(&agent_id) here matches the AgentStopped handler at line 116 — good symmetry.

One thing to tighten up:

Test gap — session_not_found_clears_session_id_on_replay (agents_tests.rs:280) only asserts that session_id is None after the event. It doesn't verify the agent was removed from the running set, which is the actual bug this PR fixes. Worth adding:

assert!(
    idx.is_running("a1") == false,  // or !idx.running.contains("a1")
    "AgentSessionNotFound must remove agent from running set"
);

Without that, the test would still pass even if the running.remove line were reverted.

Otherwise LGTM — the fix is correct, scoped, and matches the existing pattern.

🌃 Co-built with Midtown

btucker

Reviewed — the fix is correct and directly addresses the crash loop. Approve.

Two optional follow-ups (not blocking):

Test gap: session_not_found_clears_session_id_on_replay only asserts session_id is cleared. Worth adding an assertion that the agent is removed from the running set, since that's the actual bug this fixes.
Minor inconsistency with AgentStopped: AgentStopped also clears pid and sets stopped_at. AgentSessionNotFound doesn't. The agent is effectively stopped but retains a stale pid and has no stopped_at timestamp. Probably fine in practice, but could show misleading info in status output.

🌃 Co-built with Midtown

btucker · 2026-04-07T05:49:03Z

Review

Fix is correct. AgentSessionNotFound clearing session_id without removing from running meant event replay would reconstruct these agents as running (no session_id, but in running set), causing the infinite resume loop described.

The one-line addition self.running.remove(&agent_id) mirrors what AgentStopped does at line 116.

Nit (non-blocking): The existing test session_not_found_clears_session_id_on_replay should also assert !idx.running.contains("a1") to cover the new behavior. Right now the test only checks session_id is cleared but not the running set removal — so a future regression could remove this line and tests would still pass.

LGTM — nice catch on the root cause from the 4 stale btucker-reviewer records.

🌃 Co-built with Midtown

btucker

Verdict: Approve with minor suggestions

The fix is correct and well-targeted. AgentSessionNotFound clearing session_id without removing from running is a clear bug — on event replay, the agent reconstructs as "running" with no session, causing the infinite resume loop described. Adding self.running.remove(&agent_id) breaks that cycle.

Two minor suggestions:

Consider also clearing pid and setting stopped_at — AgentStopped does both (agent.pid = None; agent.stopped_at = Some(Utc::now())). Without these, an agent removed from running via AgentSessionNotFound will still have a stale pid and no stopped_at timestamp. This shouldn't cause functional issues since decision functions check running.contains(), but it leaves the Agent record in an inconsistent state that could confuse debugging or future code that inspects pid/stopped_at directly.
The existing test should assert on running state — session_not_found_clears_session_id_on_replay doesn't verify the agent was removed from the running set, which is the actual bug this PR fixes. Adding assert!(!idx.running.contains("a1")) would pin the new behavior.

Neither blocks merge — the crash loop fix is the important part and it's correct.

🌃 Co-built with Midtown

btucker commented Apr 5, 2026

View reviewed changes

btucker commented Apr 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: remove agent from running set on AgentSessionNotFound#2330

fix: remove agent from running set on AgentSessionNotFound#2330
btucker wants to merge 1 commit intomainfrom
fix/session-not-found-running

btucker commented Apr 5, 2026

Uh oh!

codecov bot commented Apr 5, 2026

Uh oh!

btucker left a comment

Uh oh!

btucker left a comment

Uh oh!

btucker commented Apr 7, 2026

Uh oh!

btucker left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

btucker commented Apr 5, 2026

Summary

Test plan

Uh oh!

codecov bot commented Apr 5, 2026

Codecov Report

Uh oh!

btucker left a comment

Choose a reason for hiding this comment

Uh oh!

btucker left a comment

Choose a reason for hiding this comment

Uh oh!

btucker commented Apr 7, 2026

Review

Uh oh!

btucker left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant