fix: include http_request_id in request-wise priming event IDs by DaleSeo · Pull Request #799 · modelcontextprotocol/rust-sdk

DaleSeo · 2026-04-10T00:58:25Z

Fixes #791

Motivation and Context

Priming events on POST request-wise SSE streams used a hardcoded event ID of "0", making it impossible for clients to identify which stream to resume after disconnection. The MCP spec requires event IDs to encode enough information to correlate a Last-Event-ID back to the originating stream. This moves priming event generation for request-wise streams into the session layer (LocalSessionManager::create_stream), where the http_request_id is available, so the priming event ID is now correctly formatted as 0/<http_request_id> (e.g. 0/0, 0/1). GET standalone and initialize priming remain unchanged at "0" since they have no per-request stream identity.

Additionally, the event cache for a request-wise channel was discarded as soon as the response was delivered, so a client that disconnected and tried to resume after the tool call finished would find nothing to replay. The cache is now retained after completion, allowing late resume requests to replay cached events. Completed entries are evicted based on a configurable completed_cache_ttl (default 60s).

How Has This Been Tested?

Added test_request_wise_priming_includes_http_request_id which verifies consecutive tool calls get correct priming and response event IDs. All existing priming, SSE concurrent streams, stale session, and custom header tests continue to pass.

Breaking Changes

SessionConfig has two new fields: sse_retry: Option<Duration> (defaults to Some(3s)) and completed_cache_ttl: Duration (defaults to 60s). Since the struct is #[non_exhaustive], this is not a breaking change for downstream consumers.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation update

Checklist

I have read the MCP Documentation
My code follows the repository's style guidelines
New and existing tests pass locally
I have added appropriate error handling
I have added or updated documentation as needed

glicht

Hi @DaleSeo,

Thanks for putting in these changes.

Keeping the cache after completion works.

But there is an edge case where if there are 2 overlapping parallel calls things don't work. In fact, the client may go into a loop waiting for a response because when a stream is not found we fallback to the resume_or_shadow_common.

I've created a test that demonstrates this at: https://github.com/binahm/rust-sdk/blob/fix/priming-event-ids/crates/rmcp/tests/test_streamable_http_priming.rs . I added to your branch 2 tests:

test_long_running_tool_single_via_mcp_client (passes)
test_long_running_tool_parallel_via_mcp_client (fails)

I've also added comments to the PR code that try to explain what I've seen.

glicht · 2026-04-10T22:26:57Z

crates/rmcp/src/transport/streamable_http_server/session/local.rs

-                        "Request-wise channel completed, falling back to common channel"
+                        "Request-wise channel not found, falling back to common channel"
                    );
                    self.resume_or_shadow_common(last_event_id.index).await


Is this correct to fallback to resume_or_shadow_common? If there was a http_request_id provided and it is not found, shouldn't we provide an error? Seems to me that this will lead to providing messages from a different stream than the one the client expects.

From the spec:

The server MUST NOT replay messages that would have been delivered on a different stream.

Good catch, @glicht. Resume now returns SessionError::ChannelClosed when the http_request_id is provided but not found in tx_router. The tower handler catches the error and creates a fresh standalone stream.

glicht · 2026-04-10T22:58:24Z

crates/rmcp/src/transport/streamable_http_server/session/local.rs

    async fn establish_request_wise_channel(
        &mut self,
    ) -> Result<StreamableHttpMessageReceiver, SessionError> {
+        self.tx_router.retain(|_, rw| !rw.tx.tx.is_closed());


If I understand correctly, the code assumes that once a new stream is created we can discard streams that were closed (including those that experienced a disconnect). This can lead to a scenario that the stream is discarded before it was fully consumed. For example if while the client was waiting before doing the resume GET request, it performed another request. Example flow:

Time 0: Client issues req A (long running task that takes for example 10 seconds) Time 9: Client receives disconnect and now will wait 3 second before performing GET resume request Time 11: Client issues req B. Req A is discarded as `tx.is_closed()` Time 12: Cilent sends GET request to resume stream from req A. But stream is `not found`.

Additionally, there is a memory risk here. If a client doesn't create a new stream the router is not cleaned out. May lead to unnecessary extra memory consumption when there are many clients which maintain a session but are not active.

Maybe there is need for a different approach, such as cleaning out HttpRequestWise after a timeout period after it completed.

You're right, the retain on new channel creation was too aggressive. Replaced it with timeout-based eviction. Let me know if this version would work.

glicht · 2026-04-10T23:31:14Z

crates/rmcp/src/transport/streamable_http_server/session/local.rs

-                    // Resume existing request-wise channel
-                    let channel = tokio::sync::mpsc::channel(self.session_config.channel_capacity);
-                    let (tx, rx) = channel;
+                    let was_completed = request_wise.tx.tx.is_closed();


If I understand correctly,tx.is_closed() doesn't necessarily mean the request completed. A disconnect from the client will cause tx.is_closed(). If there was a disconnect and the request was not completed, the stream should be left active to send messages as the request continues processing. I think there is need to add an additional completed field to HttpRequestWise which will be set at unregister_resource.

I agree. Added completed_at: Option<Instant> to HttpRequestWise, set explicitly in unregister_resource when the response is delivered.

DaleSeo · 2026-04-14T13:05:09Z

Thanks for the thorough review, @glicht. I've pushed updates addressing all three points. Your long-running tool tests also pass locally and I also wanted to add them to this PR but they kept failing in CI due to a client reconnect issue on Ubuntu runners. The server-side behavior should be covered by the existing tests and manual validation.

$ cargo test -p rmcp \
    --features "server,client,transport-streamable-http-server,transport-streamable-http-client,transport-streamable-http-client-reqwest,reqwest" \
    --test test_priming_resume

running 2 tests
test test_priming_resume::test_long_running_tool_single_via_mcp_client ... ok
test test_priming_resume::test_long_running_tool_parallel_via_mcp_client ... ok

test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 16.01s

glicht

It looks very good.

I added a comment regarding the fall through to a standalone stream when the resume stream is not found.

glicht · 2026-04-14T21:47:54Z

crates/rmcp/src/transport/streamable_http_server/tower.rs

+                    // EventSource to retry with the same Last-Event-ID in an
+                    // infinite loop. Logging at warn so malformed IDs or
+                    // unexpected failures remain visible.
+                    tracing::warn!("Resume failed ({e}), creating standalone stream");


Not sure I understand why we leave this connection open and fall through to the standalone stream. I think we should just return a success and close if an error causes an infinite retry loop. Maybe check fi it is an error of type ChannelClosed and then return success. For a different error probably want to return an error to the client. Returning the standalone stream leads to a case where client is asking for messages of a specific stream but gets messages from the standalone common channel (not sure this is what the client is expecting).

Thanks for calling this out, @glicht. This was actually the root cause of our CI test failures. 🤦‍♂️ .text().await was hanging because the standalone stream never closed.

github-actions bot added T-test Testing related changes T-core Core library changes T-transport Transport layer changes labels Apr 10, 2026

DaleSeo force-pushed the fix/priming-event-ids branch from df297e0 to b881409 Compare April 10, 2026 01:05

fix: include http_request_id in request-wise priming event IDs

f15b728

DaleSeo force-pushed the fix/priming-event-ids branch from b881409 to f15b728 Compare April 10, 2026 01:17

DaleSeo marked this pull request as ready for review April 10, 2026 01:26

DaleSeo requested a review from a team as a code owner April 10, 2026 01:26

DaleSeo added 2 commits April 9, 2026 21:34

refactor: use Option::into_iter and usize::from for priming

ea36e2b

fix: retain event cache for completed request-wise channels

114226a

DaleSeo mentioned this pull request Apr 10, 2026

Priming streamable http events use hardcoded event id of: "0" #791

Open

DaleSeo self-assigned this Apr 10, 2026

glicht reviewed Apr 11, 2026

View reviewed changes

This was referenced Apr 13, 2026

feat: SSE event store for stream resumption across instances joshrotenberg/tower-mcp#775

Closed

feat: pluggable EventStore for SSE stream resumption joshrotenberg/tower-mcp#779

Merged

DaleSeo force-pushed the fix/priming-event-ids branch 9 times, most recently from 750332d to 3e67b4f Compare April 14, 2026 02:47

fix: track completed_at for cache eviction and resume

559120d

DaleSeo force-pushed the fix/priming-event-ids branch from 3e67b4f to 559120d Compare April 14, 2026 12:49

DaleSeo requested a review from glicht April 14, 2026 13:06

DaleSeo added 2 commits April 14, 2026 10:51

fix: log resume failures at warn level

7b5c0af

test: add completed_cache_ttl eviction test

9e6625f

glicht reviewed Apr 14, 2026

View reviewed changes

DaleSeo added 2 commits April 14, 2026 18:45

fix: return empty stream on failed resume

31e21af

test: add resume after completion test

0b19ab1

DaleSeo requested a review from glicht April 14, 2026 23:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: include http_request_id in request-wise priming event IDs#799

fix: include http_request_id in request-wise priming event IDs#799
DaleSeo wants to merge 8 commits intomainfrom
fix/priming-event-ids

DaleSeo commented Apr 10, 2026 •

edited

Loading

Uh oh!

glicht left a comment •

edited

Loading

Uh oh!

glicht Apr 10, 2026

Uh oh!

DaleSeo Apr 14, 2026

Uh oh!

glicht Apr 10, 2026

Uh oh!

DaleSeo Apr 14, 2026

Uh oh!

glicht Apr 10, 2026

Uh oh!

DaleSeo Apr 14, 2026

Uh oh!

DaleSeo commented Apr 14, 2026 •

edited

Loading

Uh oh!

glicht left a comment

Uh oh!

glicht Apr 14, 2026

Uh oh!

DaleSeo Apr 14, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

DaleSeo commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation and Context

How Has This Been Tested?

Breaking Changes

Types of changes

Checklist

Uh oh!

glicht left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

glicht Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

DaleSeo Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

glicht Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

DaleSeo Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

glicht Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

DaleSeo Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

DaleSeo commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glicht left a comment

Choose a reason for hiding this comment

Uh oh!

glicht Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

DaleSeo Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

DaleSeo commented Apr 10, 2026 •

edited

Loading

glicht left a comment •

edited

Loading

DaleSeo commented Apr 14, 2026 •

edited

Loading

DaleSeo Apr 14, 2026 •

edited

Loading