Skip to content

add MCP response compression (JSON to markdown/TSV/CSV)#1537

Open
henrikrexed wants to merge 10 commits intoagentgateway:mainfrom
henrikrexed:feat/mcp-response-compression-upstream
Open

add MCP response compression (JSON to markdown/TSV/CSV)#1537
henrikrexed wants to merge 10 commits intoagentgateway:mainfrom
henrikrexed:feat/mcp-response-compression-upstream

Conversation

@henrikrexed
Copy link
Copy Markdown

MCP tool call responses often contain large JSON payloads (database rows, API results, file metadata) that consume significant context window tokens for LLM-based agents. This PR adds configurable response compression that converts JSON payloads into compact tabular formats at the proxy layer, reducing token usage without losing information.

Compression is transparent to both upstream servers (they return standard JSON) and clients (they receive already-converted text content in tool call responses).

What's included

Core compression module (mcp/compress.rs)

  • Converts JSON arrays of objects into markdown tables, TSV, or CSV
  • Handles wrapper objects with scalar metadata + array fields (e.g., paginated API responses)
  • Nested values: small arrays shown inline, larger arrays summarized, nested objects shown as {...}
  • Non-tabular JSON passes through unchanged
  • Unit tests in compress_tests.rs

Handler integration (mcp/handler.rs)

  • compress_stream() wraps upstream response streams
  • Intercepts CallToolResult messages with text content
  • Inline compression without full response buffering

CRD + xDS wiring

  • New responseCompression field on AgentgatewayBackend MCP targets (enabled bool + format string)
  • Controller translation in translate.goMCPTarget.ResponseCompression proto message
  • xDS → IR mapping in agent_xds.rs to internal CompressionFormat enum

Metrics (telemetry/metrics.rs)

  • mcp_response_compression_total / mcp_response_compression_skipped_total — compressed vs skipped counts
  • mcp_response_compression_original_bytes / mcp_response_compression_compressed_bytes — size histograms
  • mcp_response_compression_ratio — compression ratio (0.0–1.0)
  • All metrics labeled by gateway, listener, route, target, format

Telemetry (telemetry/log.rs, telemetry/trc.rs)

  • Adds child spans for correlated events (LLM, MCP, guardrail) for better trace correlation

Documentation (architecture/mcp-response-compression.md)

  • Design decisions, architecture overview, configuration flow, runtime pipeline

Configuration

apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayBackend
spec:
  mcp:
    targets:
    - name: my-server
      backendRef:
        name: my-mcp-service
        port: 8080
      responseCompression:
        enabled: true
        format: "markdown"    # or "tsv", "csv"

Example

Before (JSON, ~250 tokens):

[
  {"name": "web-server", "status": "running", "cpu": 45.2, "memory": 1024},
  {"name": "db-primary", "status": "running", "cpu": 78.1, "memory": 4096}
]

After (markdown, ~60 tokens):

| name | status | cpu | memory |
| --- | --- | --- | --- |
| web-server | running | 45.2 | 1024 |
| db-primary | running | 78.1 | 4096 |

Test plan

  • Unit tests for compression module (compress_tests.rs) — arrays, nested values, wrapper objects, non-convertible inputs
  • Snapshot test updates for new response_compression field
  • MCP handler test updates

@henrikrexed henrikrexed requested a review from a team as a code owner April 14, 2026 09:37
Copilot AI review requested due to automatic review settings April 14, 2026 09:37
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds configurable MCP tool response “compression” at the proxy layer by converting certain JSON payloads (primarily arrays of objects) into more compact tabular text formats (markdown/TSV/CSV), wired end-to-end from CRD → controller translation → xDS/proto → agent runtime, with new metrics and tracing span propagation.

Changes:

  • Introduces mcp::compress module and hooks it into MCP relay streaming responses.
  • Adds CRD/proto fields for per-target responseCompression and maps it into the agent IR.
  • Adds Prometheus metrics and additional span wiring for improved trace correlation.

Reviewed changes

Copilot reviewed 29 out of 29 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
schema/config.md Documents new responseCompression config fields
crates/protos/proto/resource.proto Adds MCPTarget.ResponseCompression proto message/field
crates/agentgateway/src/types/local_tests/mcp_normalized.snap Snapshot updated for new field presence
crates/agentgateway/src/types/local.rs Initializes new IR field in local config
crates/agentgateway/src/types/agent_xds.rs Maps proto compression config into IR enum
crates/agentgateway/src/types/agent.rs Adds response_compression to MCP target specs
crates/agentgateway/src/test_helpers/proxymock.rs Updates test backend builders for new field
crates/agentgateway/src/telemetry/trc.rs Propagates span_writer into PolicyClient
crates/agentgateway/src/telemetry/metrics.rs Adds compression metrics + labels/buckets
crates/agentgateway/src/telemetry/log.rs Adds child spans; updates PolicyClient construction
crates/agentgateway/src/proxy/httpproxy.rs Extends PolicyClient with span_writer
crates/agentgateway/src/mcp/upstream/openapi/tests.rs Updates tests for new PolicyClient fields
crates/agentgateway/src/mcp/upstream/mod.rs Adds per-target compression format lookup
crates/agentgateway/src/mcp/router.rs Passes metrics and span writer into MCP relay
crates/agentgateway/src/mcp/mod.rs Exposes new compress module
crates/agentgateway/src/mcp/mcp_tests.rs Updates MCP tests for new relay inputs/fields
crates/agentgateway/src/mcp/handler.rs Wraps upstream stream with compress_stream + metrics
crates/agentgateway/src/mcp/compress_tests.rs Adds standalone compression tests (currently unwired)
crates/agentgateway/src/mcp/compress.rs Implements JSON→table conversion + unit tests
crates/agentgateway/src/llm/policy/mod.rs Adds spans around guardrail actions
crates/agentgateway/src/llm/mod.rs Passes span writer into prompt guard calls
controller/pkg/syncer/backend/translate.go Translates CRD compression config into proto
controller/install/helm/agentgateway-crds/templates/agentgateway.dev_agentgatewaybackends.yaml Updates CRD schema for new fields
controller/api/v1alpha1/agentgateway/zz_generated.deepcopy.go Deepcopy updates for new CRD fields
controller/api/v1alpha1/agentgateway/agentgateway_backend_types.go Adds CRD types for compression (+ selector protocol field)
architecture/mcp-response-compression.md New design/architecture documentation
architecture/README.md Adds doc link to new architecture page
api/resource_json.gen.go Generated JSON marshal/unmarshal for new proto message
api/resource.pb.go Generated Go proto bindings updated for new field/message

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +289 to +294
if rc := target.ResponseCompression; rc != nil && rc.Enabled {
mcpTarget.ResponseCompression = &api.MCPTarget_ResponseCompression{
Enabled: rc.Enabled,
Format: rc.Format,
}
}
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

responseCompression is only translated for target.Static targets. When target.Selector is used, TranslateMCPSelectorTargets returns MCPTargets without applying target.ResponseCompression, so the CRD field will be silently ignored for selector-based targets. Consider applying the same ResponseCompression settings to each generated selector target (or rejecting the config if selector targets are intentionally unsupported).

Copilot uses AI. Check for mistakes.
Comment on lines +439 to +442
// Protocol is the protocol to use for the connection to the MCPBackend target.
// +optional
Protocol *MCPProtocol `json:"protocol,omitempty"`

Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

McpTargetSelector now has a Protocol field, but protocol is already configurable on McpTarget (static.protocol) and the translator code does not read McpTargetSelector.Protocol at all. This creates API surface that appears configurable in the CRD but has no effect at runtime. Either wire this field through translation (and define precedence vs static.protocol / service appProtocol), or remove it from the CRD types/schema to avoid misleading users.

Suggested change
// Protocol is the protocol to use for the connection to the MCPBackend target.
// +optional
Protocol *MCPProtocol `json:"protocol,omitempty"`

Copilot uses AI. Check for mistakes.
Comment on lines +594 to +600
let labels = crate::telemetry::metrics::CompressionLabels {
gateway: DefaultedUnknown::default(),
listener: DefaultedUnknown::default(),
route: DefaultedUnknown::default(),
target: DefaultedUnknown::from(Some(agent_core::strng::new(&target_name))),
format: DefaultedUnknown::from(Some(format)),
};
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These compression metrics are labeled with gateway/listener/route, but the implementation populates those fields with DefaultedUnknown for every observation, so the emitted series won’t actually be segmented by route/listener. Either pass a RouteIdentifier/route context into compress_stream (if available) to populate these labels, or drop the unused labels to avoid high-cardinality label sets that don’t provide value.

Copilot uses AI. Check for mistakes.
Comment thread crates/agentgateway/src/mcp/handler.rs Outdated
Comment on lines +613 to +630
metrics
.mcp_response_compression_original_bytes
.get_or_create(&labels)
.observe(original_len);
metrics
.mcp_response_compression_compressed_bytes
.get_or_create(&labels)
.observe(compressed_len);
metrics
.mcp_response_compression_ratio
.get_or_create(&labels)
.observe(ratio);
metrics
.mcp_response_compression_total
.get_or_create(&labels)
.inc();

text_content.text = compressed;
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compression is applied unconditionally whenever compress_response returns Some, even if the resulting text is larger than the original (ratio > 1.0). That can increase token usage and response size in the worst case. Consider only replacing text_content.text when compressed_len < original_len (or when ratio is below a threshold), and record metrics for attempted vs applied separately if needed.

Suggested change
metrics
.mcp_response_compression_original_bytes
.get_or_create(&labels)
.observe(original_len);
metrics
.mcp_response_compression_compressed_bytes
.get_or_create(&labels)
.observe(compressed_len);
metrics
.mcp_response_compression_ratio
.get_or_create(&labels)
.observe(ratio);
metrics
.mcp_response_compression_total
.get_or_create(&labels)
.inc();
text_content.text = compressed;
if compressed_len < original_len {
metrics
.mcp_response_compression_original_bytes
.get_or_create(&labels)
.observe(original_len);
metrics
.mcp_response_compression_compressed_bytes
.get_or_create(&labels)
.observe(compressed_len);
metrics
.mcp_response_compression_ratio
.get_or_create(&labels)
.observe(ratio);
metrics
.mcp_response_compression_total
.get_or_create(&labels)
.inc();
text_content.text = compressed;
} else {
metrics
.mcp_response_compression_skipped_total
.get_or_create(&labels)
.inc();
}

Copilot uses AI. Check for mistakes.
Comment on lines +122 to +129
result.push_str("| ");
let values: Vec<String> = all_keys
.iter()
.map(|key| obj.get(key).map(render_value).unwrap_or_default())
.collect();
result.push_str(&values.join(" | "));
result.push_str(" |\n");
}
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Markdown/TSV output is built by joining rendered cell strings directly. If a JSON string contains | or newlines, it will break the markdown table structure; if it contains tabs/newlines, it will corrupt TSV row boundaries. Consider escaping/sanitizing cell values per output format (e.g., replace | with \| for markdown, and replace \t/\n for TSV) so compression preserves the data and produces valid tabular output.

Copilot uses AI. Check for mistakes.
Comment on lines +3 to +36
MCP tool call responses frequently contain large JSON payloads — lists of database rows, API results, file metadata — that consume significant context window tokens when consumed by LLM-based agents. Response compression converts these JSON payloads into compact tabular formats (markdown, TSV, CSV) at the proxy layer, reducing token usage without losing information.

This document covers the design and architecture of the compression pipeline.

## Design Decisions

### Why at the proxy layer

Compression could happen at the MCP server, the client, or the proxy. Doing it at the proxy has several advantages:

* **No upstream changes required.** MCP servers return standard JSON; compression is transparent.
* **Per-target configuration.** Different backends may benefit from different formats — a data-heavy API might use TSV while a human-readable tool uses markdown.
* **Consistent behavior.** All clients benefit without each needing its own compression logic.

The tradeoff is that the proxy must parse and re-serialize JSON, adding latency proportional to response size. In practice this is small relative to the upstream call and LLM processing time.

### Format selection

The three formats target different consumption patterns:

* **Markdown** — best for LLMs that handle markdown well (most do). Preserves readability.
* **TSV** — minimal overhead, no escaping needed for most data. Good for structured pipelines.
* **CSV** — standard interchange format with proper escaping. Useful when downstream tooling expects CSV.

The `none` default means compression is opt-in; existing behavior is unchanged.

### What gets compressed

Compression targets `CallToolResult` messages containing text content that parses as JSON. The converter handles three shapes:

* **Arrays of objects** — the common case (e.g., database query results). Each object becomes a row, keys become column headers.
* **Wrapper objects with array fields** — scalar fields are preserved as header lines above the table, array fields are rendered as tables. This handles paginated API responses that wrap results in metadata.
* **Nested values** — arrays of 5 or fewer items are shown inline; larger arrays show the first 5 items plus a count. Nested objects display as `{...}`.

Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The design docs and PR description claim response compression is “without losing information”, but the implementation intentionally replaces nested objects with {...} and truncates arrays >5 items (see render_value). That is lossy. Consider updating this statement to reflect the lossy summarization behavior, or change the formatter to preserve nested values (e.g., render nested objects/arrays as compact JSON) when format is TSV/CSV/markdown.

Copilot uses AI. Check for mistakes.
Comment on lines +392 to +404
pub fn get_compression_format(&self, service_name: &str) -> Option<CompressionFormat> {
for tgt in &self.backend.targets {
if tgt.name.as_str() == service_name {
return match &tgt.spec {
McpTargetSpec::Sse(s) => s.response_compression,
McpTargetSpec::Mcp(s) => s.response_compression,
McpTargetSpec::OpenAPI(s) => s.response_compression,
McpTargetSpec::Stdio { .. } => None,
};
}
}
None
}
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UpstreamGroup::get_compression_format linearly scans self.backend.targets on every call. Since send_single calls this per request, this becomes O(num_targets) overhead in the hot path. Consider precomputing a HashMap<target_name, Option<CompressionFormat>> during UpstreamGroup::new/setup_connections so lookups are O(1).

Copilot uses AI. Check for mistakes.
Comment on lines +919 to +929
let compression = s
.response_compression
.as_ref()
.filter(|rc| rc.enabled)
.map(|rc| match rc.format.as_str() {
"markdown" => crate::mcp::compress::CompressionFormat::Markdown,
"tsv" => crate::mcp::compress::CompressionFormat::Tsv,
"csv" => crate::mcp::compress::CompressionFormat::Csv,
_ => crate::mcp::compress::CompressionFormat::None,
});

Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unknown/empty response_compression.format values currently map to Some(CompressionFormat::None) when enabled=true, which still causes the handler to attempt compression (JSON parse) and emit “skipped” metrics. Consider treating format="none" and unknown values as disabled (None) or returning a validation error so misconfiguration doesn’t add per-response overhead silently.

Suggested change
let compression = s
.response_compression
.as_ref()
.filter(|rc| rc.enabled)
.map(|rc| match rc.format.as_str() {
"markdown" => crate::mcp::compress::CompressionFormat::Markdown,
"tsv" => crate::mcp::compress::CompressionFormat::Tsv,
"csv" => crate::mcp::compress::CompressionFormat::Csv,
_ => crate::mcp::compress::CompressionFormat::None,
});
let compression = s.response_compression.as_ref().and_then(|rc| {
if !rc.enabled {
return None;
}
match rc.format.as_str() {
"markdown" => Some(crate::mcp::compress::CompressionFormat::Markdown),
"tsv" => Some(crate::mcp::compress::CompressionFormat::Tsv),
"csv" => Some(crate::mcp::compress::CompressionFormat::Csv),
"none" | "" => None,
_ => None,
}
});

Copilot uses AI. Check for mistakes.
Comment on lines +46 to +73
// Look for a top-level key whose value is an array of objects
for (key, value) in obj.iter() {
if let Value::Array(arr) = value
&& arr.iter().all(|v| v.is_object())
&& !arr.is_empty()
{
let mut result = String::new();

// Add scalar fields as header lines
for (k, v) in obj.iter() {
if k != key && !v.is_array() {
result.push_str(&format!("{}: {}\n", k, render_value(v)));
}
}
if !result.is_empty() {
result.push('\n');
}

result.push_str(&convert_array_to_table(arr, format));
return Some(result);
}
}
None
},
_ => None,
}
}

Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For wrapper objects, the implementation compresses only the first top-level array-of-objects field it encounters and ignores any additional array fields. The PR description/docs mention handling “array fields” (plural), and some APIs return multiple tabular arrays (e.g., items + included). Consider iterating all qualifying array fields (or applying a deterministic selection like items/results/data) so behavior is predictable and matches the documented intent.

Suggested change
// Look for a top-level key whose value is an array of objects
for (key, value) in obj.iter() {
if let Value::Array(arr) = value
&& arr.iter().all(|v| v.is_object())
&& !arr.is_empty()
{
let mut result = String::new();
// Add scalar fields as header lines
for (k, v) in obj.iter() {
if k != key && !v.is_array() {
result.push_str(&format!("{}: {}\n", k, render_value(v)));
}
}
if !result.is_empty() {
result.push('\n');
}
result.push_str(&convert_array_to_table(arr, format));
return Some(result);
}
}
None
},
_ => None,
}
}
let array_fields = collect_array_object_fields(obj);
if array_fields.is_empty() {
return None;
}
let mut result = String::new();
// Add scalar fields as header lines
for (k, v) in obj.iter() {
if !v.is_array() {
result.push_str(&format!("{}: {}\n", k, render_value(v)));
}
}
if !result.is_empty() {
result.push('\n');
}
let multiple_arrays = array_fields.len() > 1;
for (index, (key, arr)) in array_fields.iter().enumerate() {
if multiple_arrays {
result.push_str(&format!("{}:\n", key));
}
result.push_str(&convert_array_to_table(arr, format));
if index + 1 < array_fields.len() {
result.push_str("\n\n");
}
}
Some(result)
},
_ => None,
}
}
fn collect_array_object_fields<'a>(
obj: &'a serde_json::Map<String, Value>,
) -> Vec<(&'a String, &'a Vec<Value>)> {
obj.iter()
.filter_map(|(key, value)| match value {
Value::Array(arr) if !arr.is_empty() && arr.iter().all(|v| v.is_object()) => {
Some((key, arr))
},
_ => None,
})
.collect()
}

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,125 @@
#[cfg(test)]
mod tests {
use super::*;
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file’s tests won’t run as written: crates/agentgateway/src/mcp/mod.rs does not declare mod compress_tests, so this module isn’t compiled, and use super::*; would also not bring compress_response/CompressionFormat into scope even if it were. Either remove this file and keep the unit tests in compress.rs, or add the proper #[cfg(test)] mod compress_tests; wiring and import from crate::mcp::compress explicitly.

Suggested change
use super::*;
use crate::mcp::compress::{compress_response, CompressionFormat};

Copilot uses AI. Check for mistakes.
henrikrexed and others added 10 commits April 17, 2026 08:26
Add response compression for MCP tool call results, reducing token
usage when LLMs consume structured JSON responses. Supports three
output formats: markdown tables, TSV, and CSV.

The compression is configurable per-backend via the `responseCompression`
field in both static config and the AgentgatewayBackend CRD. When
enabled, JSON array/object responses from MCP tool calls are
automatically converted to the specified tabular format.

Key changes:
- New `compress` module with format conversion logic and tests
- Handler integration to compress CallToolResult content
- Proto/xDS extension for responseCompression configuration
- Test helpers and snapshot updates for the new field

Co-Authored-By: Paperclip <noreply@paperclip.ing>
Signed-off-by: Henrik Rexed <henrik.rexed@gmail.com>
Wire the responseCompression configuration from the Kubernetes CRD
through the xDS protocol to the proxy. Adds the ResponseCompression
type to the backend spec with enabled/format fields, updates the
deepcopy generator output, Helm CRD templates, and the syncer
translation layer.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
Signed-off-by: Henrik Rexed <henrik.rexed@gmail.com>
Create proper child spans for correlated telemetry events instead of
logging them as independent entries. This improves trace correlation
for LLM calls, MCP operations, and guardrail checks.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
Signed-off-by: Henrik Rexed <henrik.rexed@gmail.com>
Add Prometheus metrics to track compression operations: request
counts by format, compression ratios, original/compressed sizes,
and processing duration. Enables monitoring of compression
effectiveness across backends.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
Signed-off-by: Henrik Rexed <henrik.rexed@gmail.com>
Document the response compression feature including configuration
options, supported formats, metrics, and architecture overview.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
Signed-off-by: Henrik Rexed <henrik.rexed@gmail.com>
- Delete orphan compress_tests.rs (tests already inline in compress.rs)
- Add cell escaping: pipe chars in markdown, tabs/newlines in TSV
- Skip compression when result is larger than original
- Simplify metrics labels to target+format (remove unused gateway/listener/route)
- Fix xDS format mapping: unknown format with enabled=true treated as disabled
- Wire responseCompression for selector-based targets in translate.go
- Remove unwired Protocol field from McpTargetSelector CRD
- Fix docs: note lossy summarization for nested objects/large arrays

Co-Authored-By: Paperclip <noreply@paperclip.ing>
Signed-off-by: Henrik Rexed <henrik.rexed@gmail.com>
PolicyClient and Relay::new() gained a span_writer field and a 4th
metrics argument after the rebase; six test call sites were not updated.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
Signed-off-by: Henrik Rexed <henrik.rexed@gmail.com>
…initializers

OIDC and LLM tests were missing the span_writer field on PolicyClient,
and MCP tests were missing response_compression on SseTargetSpec and
OpenAPITarget, all added during the compression feature rebase.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
Signed-off-by: Henrik Rexed <henrik.rexed@gmail.com>
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Signed-off-by: Henrik Rexed <henrik.rexed@gmail.com>
- Update new stateless_multiplex_delete_session_skips_uninitialized_targets
  test to include span_writer + metrics args after rebase onto main
- Collapse consecutive str::replace in escape_tsv (clippy
  collapsible_str_replace under -D warnings)

Co-Authored-By: Paperclip <noreply@paperclip.ing>
Signed-off-by: Henrik Rexed <henrik.rexed@gmail.com>
@henrikrexed henrikrexed force-pushed the feat/mcp-response-compression-upstream branch from e10bde4 to 7ce851c Compare April 17, 2026 06:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants