add MCP response compression (JSON to markdown/TSV/CSV)#1537
add MCP response compression (JSON to markdown/TSV/CSV)#1537henrikrexed wants to merge 10 commits intoagentgateway:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds configurable MCP tool response “compression” at the proxy layer by converting certain JSON payloads (primarily arrays of objects) into more compact tabular text formats (markdown/TSV/CSV), wired end-to-end from CRD → controller translation → xDS/proto → agent runtime, with new metrics and tracing span propagation.
Changes:
- Introduces
mcp::compressmodule and hooks it into MCP relay streaming responses. - Adds CRD/proto fields for per-target
responseCompressionand maps it into the agent IR. - Adds Prometheus metrics and additional span wiring for improved trace correlation.
Reviewed changes
Copilot reviewed 29 out of 29 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| schema/config.md | Documents new responseCompression config fields |
| crates/protos/proto/resource.proto | Adds MCPTarget.ResponseCompression proto message/field |
| crates/agentgateway/src/types/local_tests/mcp_normalized.snap | Snapshot updated for new field presence |
| crates/agentgateway/src/types/local.rs | Initializes new IR field in local config |
| crates/agentgateway/src/types/agent_xds.rs | Maps proto compression config into IR enum |
| crates/agentgateway/src/types/agent.rs | Adds response_compression to MCP target specs |
| crates/agentgateway/src/test_helpers/proxymock.rs | Updates test backend builders for new field |
| crates/agentgateway/src/telemetry/trc.rs | Propagates span_writer into PolicyClient |
| crates/agentgateway/src/telemetry/metrics.rs | Adds compression metrics + labels/buckets |
| crates/agentgateway/src/telemetry/log.rs | Adds child spans; updates PolicyClient construction |
| crates/agentgateway/src/proxy/httpproxy.rs | Extends PolicyClient with span_writer |
| crates/agentgateway/src/mcp/upstream/openapi/tests.rs | Updates tests for new PolicyClient fields |
| crates/agentgateway/src/mcp/upstream/mod.rs | Adds per-target compression format lookup |
| crates/agentgateway/src/mcp/router.rs | Passes metrics and span writer into MCP relay |
| crates/agentgateway/src/mcp/mod.rs | Exposes new compress module |
| crates/agentgateway/src/mcp/mcp_tests.rs | Updates MCP tests for new relay inputs/fields |
| crates/agentgateway/src/mcp/handler.rs | Wraps upstream stream with compress_stream + metrics |
| crates/agentgateway/src/mcp/compress_tests.rs | Adds standalone compression tests (currently unwired) |
| crates/agentgateway/src/mcp/compress.rs | Implements JSON→table conversion + unit tests |
| crates/agentgateway/src/llm/policy/mod.rs | Adds spans around guardrail actions |
| crates/agentgateway/src/llm/mod.rs | Passes span writer into prompt guard calls |
| controller/pkg/syncer/backend/translate.go | Translates CRD compression config into proto |
| controller/install/helm/agentgateway-crds/templates/agentgateway.dev_agentgatewaybackends.yaml | Updates CRD schema for new fields |
| controller/api/v1alpha1/agentgateway/zz_generated.deepcopy.go | Deepcopy updates for new CRD fields |
| controller/api/v1alpha1/agentgateway/agentgateway_backend_types.go | Adds CRD types for compression (+ selector protocol field) |
| architecture/mcp-response-compression.md | New design/architecture documentation |
| architecture/README.md | Adds doc link to new architecture page |
| api/resource_json.gen.go | Generated JSON marshal/unmarshal for new proto message |
| api/resource.pb.go | Generated Go proto bindings updated for new field/message |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if rc := target.ResponseCompression; rc != nil && rc.Enabled { | ||
| mcpTarget.ResponseCompression = &api.MCPTarget_ResponseCompression{ | ||
| Enabled: rc.Enabled, | ||
| Format: rc.Format, | ||
| } | ||
| } |
There was a problem hiding this comment.
responseCompression is only translated for target.Static targets. When target.Selector is used, TranslateMCPSelectorTargets returns MCPTargets without applying target.ResponseCompression, so the CRD field will be silently ignored for selector-based targets. Consider applying the same ResponseCompression settings to each generated selector target (or rejecting the config if selector targets are intentionally unsupported).
| // Protocol is the protocol to use for the connection to the MCPBackend target. | ||
| // +optional | ||
| Protocol *MCPProtocol `json:"protocol,omitempty"` | ||
|
|
There was a problem hiding this comment.
McpTargetSelector now has a Protocol field, but protocol is already configurable on McpTarget (static.protocol) and the translator code does not read McpTargetSelector.Protocol at all. This creates API surface that appears configurable in the CRD but has no effect at runtime. Either wire this field through translation (and define precedence vs static.protocol / service appProtocol), or remove it from the CRD types/schema to avoid misleading users.
| // Protocol is the protocol to use for the connection to the MCPBackend target. | |
| // +optional | |
| Protocol *MCPProtocol `json:"protocol,omitempty"` |
| let labels = crate::telemetry::metrics::CompressionLabels { | ||
| gateway: DefaultedUnknown::default(), | ||
| listener: DefaultedUnknown::default(), | ||
| route: DefaultedUnknown::default(), | ||
| target: DefaultedUnknown::from(Some(agent_core::strng::new(&target_name))), | ||
| format: DefaultedUnknown::from(Some(format)), | ||
| }; |
There was a problem hiding this comment.
These compression metrics are labeled with gateway/listener/route, but the implementation populates those fields with DefaultedUnknown for every observation, so the emitted series won’t actually be segmented by route/listener. Either pass a RouteIdentifier/route context into compress_stream (if available) to populate these labels, or drop the unused labels to avoid high-cardinality label sets that don’t provide value.
| metrics | ||
| .mcp_response_compression_original_bytes | ||
| .get_or_create(&labels) | ||
| .observe(original_len); | ||
| metrics | ||
| .mcp_response_compression_compressed_bytes | ||
| .get_or_create(&labels) | ||
| .observe(compressed_len); | ||
| metrics | ||
| .mcp_response_compression_ratio | ||
| .get_or_create(&labels) | ||
| .observe(ratio); | ||
| metrics | ||
| .mcp_response_compression_total | ||
| .get_or_create(&labels) | ||
| .inc(); | ||
|
|
||
| text_content.text = compressed; |
There was a problem hiding this comment.
Compression is applied unconditionally whenever compress_response returns Some, even if the resulting text is larger than the original (ratio > 1.0). That can increase token usage and response size in the worst case. Consider only replacing text_content.text when compressed_len < original_len (or when ratio is below a threshold), and record metrics for attempted vs applied separately if needed.
| metrics | |
| .mcp_response_compression_original_bytes | |
| .get_or_create(&labels) | |
| .observe(original_len); | |
| metrics | |
| .mcp_response_compression_compressed_bytes | |
| .get_or_create(&labels) | |
| .observe(compressed_len); | |
| metrics | |
| .mcp_response_compression_ratio | |
| .get_or_create(&labels) | |
| .observe(ratio); | |
| metrics | |
| .mcp_response_compression_total | |
| .get_or_create(&labels) | |
| .inc(); | |
| text_content.text = compressed; | |
| if compressed_len < original_len { | |
| metrics | |
| .mcp_response_compression_original_bytes | |
| .get_or_create(&labels) | |
| .observe(original_len); | |
| metrics | |
| .mcp_response_compression_compressed_bytes | |
| .get_or_create(&labels) | |
| .observe(compressed_len); | |
| metrics | |
| .mcp_response_compression_ratio | |
| .get_or_create(&labels) | |
| .observe(ratio); | |
| metrics | |
| .mcp_response_compression_total | |
| .get_or_create(&labels) | |
| .inc(); | |
| text_content.text = compressed; | |
| } else { | |
| metrics | |
| .mcp_response_compression_skipped_total | |
| .get_or_create(&labels) | |
| .inc(); | |
| } |
| result.push_str("| "); | ||
| let values: Vec<String> = all_keys | ||
| .iter() | ||
| .map(|key| obj.get(key).map(render_value).unwrap_or_default()) | ||
| .collect(); | ||
| result.push_str(&values.join(" | ")); | ||
| result.push_str(" |\n"); | ||
| } |
There was a problem hiding this comment.
Markdown/TSV output is built by joining rendered cell strings directly. If a JSON string contains | or newlines, it will break the markdown table structure; if it contains tabs/newlines, it will corrupt TSV row boundaries. Consider escaping/sanitizing cell values per output format (e.g., replace | with \| for markdown, and replace \t/\n for TSV) so compression preserves the data and produces valid tabular output.
| MCP tool call responses frequently contain large JSON payloads — lists of database rows, API results, file metadata — that consume significant context window tokens when consumed by LLM-based agents. Response compression converts these JSON payloads into compact tabular formats (markdown, TSV, CSV) at the proxy layer, reducing token usage without losing information. | ||
|
|
||
| This document covers the design and architecture of the compression pipeline. | ||
|
|
||
| ## Design Decisions | ||
|
|
||
| ### Why at the proxy layer | ||
|
|
||
| Compression could happen at the MCP server, the client, or the proxy. Doing it at the proxy has several advantages: | ||
|
|
||
| * **No upstream changes required.** MCP servers return standard JSON; compression is transparent. | ||
| * **Per-target configuration.** Different backends may benefit from different formats — a data-heavy API might use TSV while a human-readable tool uses markdown. | ||
| * **Consistent behavior.** All clients benefit without each needing its own compression logic. | ||
|
|
||
| The tradeoff is that the proxy must parse and re-serialize JSON, adding latency proportional to response size. In practice this is small relative to the upstream call and LLM processing time. | ||
|
|
||
| ### Format selection | ||
|
|
||
| The three formats target different consumption patterns: | ||
|
|
||
| * **Markdown** — best for LLMs that handle markdown well (most do). Preserves readability. | ||
| * **TSV** — minimal overhead, no escaping needed for most data. Good for structured pipelines. | ||
| * **CSV** — standard interchange format with proper escaping. Useful when downstream tooling expects CSV. | ||
|
|
||
| The `none` default means compression is opt-in; existing behavior is unchanged. | ||
|
|
||
| ### What gets compressed | ||
|
|
||
| Compression targets `CallToolResult` messages containing text content that parses as JSON. The converter handles three shapes: | ||
|
|
||
| * **Arrays of objects** — the common case (e.g., database query results). Each object becomes a row, keys become column headers. | ||
| * **Wrapper objects with array fields** — scalar fields are preserved as header lines above the table, array fields are rendered as tables. This handles paginated API responses that wrap results in metadata. | ||
| * **Nested values** — arrays of 5 or fewer items are shown inline; larger arrays show the first 5 items plus a count. Nested objects display as `{...}`. | ||
|
|
There was a problem hiding this comment.
The design docs and PR description claim response compression is “without losing information”, but the implementation intentionally replaces nested objects with {...} and truncates arrays >5 items (see render_value). That is lossy. Consider updating this statement to reflect the lossy summarization behavior, or change the formatter to preserve nested values (e.g., render nested objects/arrays as compact JSON) when format is TSV/CSV/markdown.
| pub fn get_compression_format(&self, service_name: &str) -> Option<CompressionFormat> { | ||
| for tgt in &self.backend.targets { | ||
| if tgt.name.as_str() == service_name { | ||
| return match &tgt.spec { | ||
| McpTargetSpec::Sse(s) => s.response_compression, | ||
| McpTargetSpec::Mcp(s) => s.response_compression, | ||
| McpTargetSpec::OpenAPI(s) => s.response_compression, | ||
| McpTargetSpec::Stdio { .. } => None, | ||
| }; | ||
| } | ||
| } | ||
| None | ||
| } |
There was a problem hiding this comment.
UpstreamGroup::get_compression_format linearly scans self.backend.targets on every call. Since send_single calls this per request, this becomes O(num_targets) overhead in the hot path. Consider precomputing a HashMap<target_name, Option<CompressionFormat>> during UpstreamGroup::new/setup_connections so lookups are O(1).
| let compression = s | ||
| .response_compression | ||
| .as_ref() | ||
| .filter(|rc| rc.enabled) | ||
| .map(|rc| match rc.format.as_str() { | ||
| "markdown" => crate::mcp::compress::CompressionFormat::Markdown, | ||
| "tsv" => crate::mcp::compress::CompressionFormat::Tsv, | ||
| "csv" => crate::mcp::compress::CompressionFormat::Csv, | ||
| _ => crate::mcp::compress::CompressionFormat::None, | ||
| }); | ||
|
|
There was a problem hiding this comment.
Unknown/empty response_compression.format values currently map to Some(CompressionFormat::None) when enabled=true, which still causes the handler to attempt compression (JSON parse) and emit “skipped” metrics. Consider treating format="none" and unknown values as disabled (None) or returning a validation error so misconfiguration doesn’t add per-response overhead silently.
| let compression = s | |
| .response_compression | |
| .as_ref() | |
| .filter(|rc| rc.enabled) | |
| .map(|rc| match rc.format.as_str() { | |
| "markdown" => crate::mcp::compress::CompressionFormat::Markdown, | |
| "tsv" => crate::mcp::compress::CompressionFormat::Tsv, | |
| "csv" => crate::mcp::compress::CompressionFormat::Csv, | |
| _ => crate::mcp::compress::CompressionFormat::None, | |
| }); | |
| let compression = s.response_compression.as_ref().and_then(|rc| { | |
| if !rc.enabled { | |
| return None; | |
| } | |
| match rc.format.as_str() { | |
| "markdown" => Some(crate::mcp::compress::CompressionFormat::Markdown), | |
| "tsv" => Some(crate::mcp::compress::CompressionFormat::Tsv), | |
| "csv" => Some(crate::mcp::compress::CompressionFormat::Csv), | |
| "none" | "" => None, | |
| _ => None, | |
| } | |
| }); |
| // Look for a top-level key whose value is an array of objects | ||
| for (key, value) in obj.iter() { | ||
| if let Value::Array(arr) = value | ||
| && arr.iter().all(|v| v.is_object()) | ||
| && !arr.is_empty() | ||
| { | ||
| let mut result = String::new(); | ||
|
|
||
| // Add scalar fields as header lines | ||
| for (k, v) in obj.iter() { | ||
| if k != key && !v.is_array() { | ||
| result.push_str(&format!("{}: {}\n", k, render_value(v))); | ||
| } | ||
| } | ||
| if !result.is_empty() { | ||
| result.push('\n'); | ||
| } | ||
|
|
||
| result.push_str(&convert_array_to_table(arr, format)); | ||
| return Some(result); | ||
| } | ||
| } | ||
| None | ||
| }, | ||
| _ => None, | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
For wrapper objects, the implementation compresses only the first top-level array-of-objects field it encounters and ignores any additional array fields. The PR description/docs mention handling “array fields” (plural), and some APIs return multiple tabular arrays (e.g., items + included). Consider iterating all qualifying array fields (or applying a deterministic selection like items/results/data) so behavior is predictable and matches the documented intent.
| // Look for a top-level key whose value is an array of objects | |
| for (key, value) in obj.iter() { | |
| if let Value::Array(arr) = value | |
| && arr.iter().all(|v| v.is_object()) | |
| && !arr.is_empty() | |
| { | |
| let mut result = String::new(); | |
| // Add scalar fields as header lines | |
| for (k, v) in obj.iter() { | |
| if k != key && !v.is_array() { | |
| result.push_str(&format!("{}: {}\n", k, render_value(v))); | |
| } | |
| } | |
| if !result.is_empty() { | |
| result.push('\n'); | |
| } | |
| result.push_str(&convert_array_to_table(arr, format)); | |
| return Some(result); | |
| } | |
| } | |
| None | |
| }, | |
| _ => None, | |
| } | |
| } | |
| let array_fields = collect_array_object_fields(obj); | |
| if array_fields.is_empty() { | |
| return None; | |
| } | |
| let mut result = String::new(); | |
| // Add scalar fields as header lines | |
| for (k, v) in obj.iter() { | |
| if !v.is_array() { | |
| result.push_str(&format!("{}: {}\n", k, render_value(v))); | |
| } | |
| } | |
| if !result.is_empty() { | |
| result.push('\n'); | |
| } | |
| let multiple_arrays = array_fields.len() > 1; | |
| for (index, (key, arr)) in array_fields.iter().enumerate() { | |
| if multiple_arrays { | |
| result.push_str(&format!("{}:\n", key)); | |
| } | |
| result.push_str(&convert_array_to_table(arr, format)); | |
| if index + 1 < array_fields.len() { | |
| result.push_str("\n\n"); | |
| } | |
| } | |
| Some(result) | |
| }, | |
| _ => None, | |
| } | |
| } | |
| fn collect_array_object_fields<'a>( | |
| obj: &'a serde_json::Map<String, Value>, | |
| ) -> Vec<(&'a String, &'a Vec<Value>)> { | |
| obj.iter() | |
| .filter_map(|(key, value)| match value { | |
| Value::Array(arr) if !arr.is_empty() && arr.iter().all(|v| v.is_object()) => { | |
| Some((key, arr)) | |
| }, | |
| _ => None, | |
| }) | |
| .collect() | |
| } |
| @@ -0,0 +1,125 @@ | |||
| #[cfg(test)] | |||
| mod tests { | |||
| use super::*; | |||
There was a problem hiding this comment.
This file’s tests won’t run as written: crates/agentgateway/src/mcp/mod.rs does not declare mod compress_tests, so this module isn’t compiled, and use super::*; would also not bring compress_response/CompressionFormat into scope even if it were. Either remove this file and keep the unit tests in compress.rs, or add the proper #[cfg(test)] mod compress_tests; wiring and import from crate::mcp::compress explicitly.
| use super::*; | |
| use crate::mcp::compress::{compress_response, CompressionFormat}; |
394676d to
e10bde4
Compare
Add response compression for MCP tool call results, reducing token usage when LLMs consume structured JSON responses. Supports three output formats: markdown tables, TSV, and CSV. The compression is configurable per-backend via the `responseCompression` field in both static config and the AgentgatewayBackend CRD. When enabled, JSON array/object responses from MCP tool calls are automatically converted to the specified tabular format. Key changes: - New `compress` module with format conversion logic and tests - Handler integration to compress CallToolResult content - Proto/xDS extension for responseCompression configuration - Test helpers and snapshot updates for the new field Co-Authored-By: Paperclip <noreply@paperclip.ing> Signed-off-by: Henrik Rexed <henrik.rexed@gmail.com>
Wire the responseCompression configuration from the Kubernetes CRD through the xDS protocol to the proxy. Adds the ResponseCompression type to the backend spec with enabled/format fields, updates the deepcopy generator output, Helm CRD templates, and the syncer translation layer. Co-Authored-By: Paperclip <noreply@paperclip.ing> Signed-off-by: Henrik Rexed <henrik.rexed@gmail.com>
Create proper child spans for correlated telemetry events instead of logging them as independent entries. This improves trace correlation for LLM calls, MCP operations, and guardrail checks. Co-Authored-By: Paperclip <noreply@paperclip.ing> Signed-off-by: Henrik Rexed <henrik.rexed@gmail.com>
Add Prometheus metrics to track compression operations: request counts by format, compression ratios, original/compressed sizes, and processing duration. Enables monitoring of compression effectiveness across backends. Co-Authored-By: Paperclip <noreply@paperclip.ing> Signed-off-by: Henrik Rexed <henrik.rexed@gmail.com>
Document the response compression feature including configuration options, supported formats, metrics, and architecture overview. Co-Authored-By: Paperclip <noreply@paperclip.ing> Signed-off-by: Henrik Rexed <henrik.rexed@gmail.com>
- Delete orphan compress_tests.rs (tests already inline in compress.rs) - Add cell escaping: pipe chars in markdown, tabs/newlines in TSV - Skip compression when result is larger than original - Simplify metrics labels to target+format (remove unused gateway/listener/route) - Fix xDS format mapping: unknown format with enabled=true treated as disabled - Wire responseCompression for selector-based targets in translate.go - Remove unwired Protocol field from McpTargetSelector CRD - Fix docs: note lossy summarization for nested objects/large arrays Co-Authored-By: Paperclip <noreply@paperclip.ing> Signed-off-by: Henrik Rexed <henrik.rexed@gmail.com>
PolicyClient and Relay::new() gained a span_writer field and a 4th metrics argument after the rebase; six test call sites were not updated. Co-Authored-By: Paperclip <noreply@paperclip.ing> Signed-off-by: Henrik Rexed <henrik.rexed@gmail.com>
…initializers OIDC and LLM tests were missing the span_writer field on PolicyClient, and MCP tests were missing response_compression on SseTargetSpec and OpenAPITarget, all added during the compression feature rebase. Co-Authored-By: Paperclip <noreply@paperclip.ing> Signed-off-by: Henrik Rexed <henrik.rexed@gmail.com>
Co-Authored-By: Paperclip <noreply@paperclip.ing> Signed-off-by: Henrik Rexed <henrik.rexed@gmail.com>
- Update new stateless_multiplex_delete_session_skips_uninitialized_targets test to include span_writer + metrics args after rebase onto main - Collapse consecutive str::replace in escape_tsv (clippy collapsible_str_replace under -D warnings) Co-Authored-By: Paperclip <noreply@paperclip.ing> Signed-off-by: Henrik Rexed <henrik.rexed@gmail.com>
e10bde4 to
7ce851c
Compare
MCP tool call responses often contain large JSON payloads (database rows, API results, file metadata) that consume significant context window tokens for LLM-based agents. This PR adds configurable response compression that converts JSON payloads into compact tabular formats at the proxy layer, reducing token usage without losing information.
Compression is transparent to both upstream servers (they return standard JSON) and clients (they receive already-converted text content in tool call responses).
What's included
Core compression module (
mcp/compress.rs){...}compress_tests.rsHandler integration (
mcp/handler.rs)compress_stream()wraps upstream response streamsCallToolResultmessages with text contentCRD + xDS wiring
responseCompressionfield onAgentgatewayBackendMCP targets (enabledbool +formatstring)translate.go→MCPTarget.ResponseCompressionproto messageagent_xds.rsto internalCompressionFormatenumMetrics (
telemetry/metrics.rs)mcp_response_compression_total/mcp_response_compression_skipped_total— compressed vs skipped countsmcp_response_compression_original_bytes/mcp_response_compression_compressed_bytes— size histogramsmcp_response_compression_ratio— compression ratio (0.0–1.0)gateway,listener,route,target,formatTelemetry (
telemetry/log.rs,telemetry/trc.rs)Documentation (
architecture/mcp-response-compression.md)Configuration
Example
Before (JSON, ~250 tokens):
[ {"name": "web-server", "status": "running", "cpu": 45.2, "memory": 1024}, {"name": "db-primary", "status": "running", "cpu": 78.1, "memory": 4096} ]After (markdown, ~60 tokens):
Test plan
compress_tests.rs) — arrays, nested values, wrapper objects, non-convertible inputsresponse_compressionfield