Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
384 changes: 225 additions & 159 deletions api/resource.pb.go

Large diffs are not rendered by default.

11 changes: 11 additions & 0 deletions api/resource_json.gen.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion architecture/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,5 @@ This folder contains developer-facing documentation on the project architecture.

Recommended reading order:
1. [Configuration](configuration.md)
1. [CEL](cel.md)
1. [CEL](cel.md)
1. [MCP Response Compression](mcp-response-compression.md)
74 changes: 74 additions & 0 deletions architecture/mcp-response-compression.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# MCP Response Compression

MCP tool call responses frequently contain large JSON payloads — lists of database rows, API results, file metadata — that consume significant context window tokens when consumed by LLM-based agents. Response compression converts these JSON payloads into compact tabular formats (markdown, TSV, CSV) at the proxy layer, reducing token usage while preserving the tabular structure. Note that nested objects are summarized as `{...}` and large arrays are truncated, so this is a lossy transformation optimized for LLM consumption rather than lossless encoding.

This document covers the design and architecture of the compression pipeline.

## Design Decisions

### Why at the proxy layer

Compression could happen at the MCP server, the client, or the proxy. Doing it at the proxy has several advantages:

* **No upstream changes required.** MCP servers return standard JSON; compression is transparent.
* **Per-target configuration.** Different backends may benefit from different formats — a data-heavy API might use TSV while a human-readable tool uses markdown.
* **Consistent behavior.** All clients benefit without each needing its own compression logic.

The tradeoff is that the proxy must parse and re-serialize JSON, adding latency proportional to response size. In practice this is small relative to the upstream call and LLM processing time.

### Format selection

The three formats target different consumption patterns:

* **Markdown** — best for LLMs that handle markdown well (most do). Preserves readability.
* **TSV** — minimal overhead, no escaping needed for most data. Good for structured pipelines.
* **CSV** — standard interchange format with proper escaping. Useful when downstream tooling expects CSV.

The `none` default means compression is opt-in; existing behavior is unchanged.

### What gets compressed

Compression targets `CallToolResult` messages containing text content that parses as JSON. The converter handles three shapes:

* **Arrays of objects** — the common case (e.g., database query results). Each object becomes a row, keys become column headers.
* **Wrapper objects with array fields** — scalar fields are preserved as header lines above the table, array fields are rendered as tables. This handles paginated API responses that wrap results in metadata.
* **Nested values** — arrays of 5 or fewer items are shown inline; larger arrays show the first 5 items plus a count. Nested objects display as `{...}`.

Non-tabular JSON (scalars, deeply nested structures) passes through unchanged. This is intentional — forcing non-tabular data into a table would lose information.

## Architecture

### Configuration flow

Response compression follows the same configuration pattern as other per-target settings:

1. **CRD** — `responseCompression` on [`AgentgatewayBackend`](../controller/api/v1alpha1/agentgateway/agentgateway_backend_types.go) MCP targets, with `enabled` (bool) and `format` (string) fields.
2. **Controller** — [`translate.go`](../controller/pkg/syncer/backend/translate.go) maps the CRD field into the xDS [`MCPTarget.ResponseCompression`](../crates/protos/proto/resource.proto) proto message.
3. **xDS → IR** — [`agent_xds.rs`](../crates/agentgateway/src/types/agent_xds.rs) converts the proto format string to the internal `CompressionFormat` enum on [`McpTarget`](../crates/agentgateway/src/types/agent.rs).

This maintains the project's design philosophy of nearly direct CRD → xDS → IR mappings.

### Runtime pipeline

The compression module lives in [`mcp/compress.rs`](../crates/agentgateway/src/mcp/compress.rs). At runtime:

1. The MCP handler in [`handler.rs`](../crates/agentgateway/src/mcp/handler.rs) checks the target's `response_compression` field.
2. If enabled, `compress_stream()` wraps the upstream response stream. For each `ServerJsonRpcMessage` containing a `CallToolResult` with text content, it calls `compress_response()`.
3. `compress_response()` attempts JSON parsing. If the content is valid JSON with tabular structure, it converts to the target format. Otherwise the content passes through unchanged.
4. Metrics are recorded for each attempt — see the [metrics section](#metrics) below.

The stream wrapping approach means compression happens inline without buffering the entire response, though individual tool call results are fully parsed.

### Metrics

Compression exposes Prometheus metrics through the standard agentgateway metrics registry in [`telemetry/metrics.rs`](../crates/agentgateway/src/telemetry/metrics.rs):

* `mcp_response_compression_total` / `mcp_response_compression_skipped_total` — counts of compressed vs. skipped responses.
* `mcp_response_compression_original_bytes` / `mcp_response_compression_compressed_bytes` — size histograms.
* `mcp_response_compression_ratio` — compression ratio (0.0–1.0).

All metrics carry `target` and `format` labels.

## Testing

Unit tests in [`compress_tests.rs`](../crates/agentgateway/src/mcp/compress_tests.rs) cover the core conversion logic: arrays of objects, nested values, wrapper objects, and non-convertible inputs. Integration with the handler is tested through the existing MCP test infrastructure in [`mcp_tests.rs`](../crates/agentgateway/src/mcp/mcp_tests.rs).
15 changes: 15 additions & 0 deletions controller/api/v1alpha1/agentgateway/agentgateway_backend_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -444,6 +444,21 @@ type McpTargetSelector struct {
// instead.
// +optional
Static *McpTarget `json:"static,omitempty"`

// ResponseCompression configures response compression for the target.
// +optional
ResponseCompression *ResponseCompression `json:"responseCompression,omitempty"`
}

// ResponseCompression configures response compression.
type ResponseCompression struct {
// Enabled determines if response compression is enabled.
// +optional
Enabled bool `json:"enabled,omitempty"`

// Format specifies the format to use for compression (e.g., markdown).
// +optional
Format string `json:"format,omitempty"`
}

const (
Expand Down
20 changes: 20 additions & 0 deletions controller/api/v1alpha1/agentgateway/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
Expand Up @@ -6764,6 +6764,19 @@ spec:
minLength: 1
pattern: ^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$
type: string
responseCompression:
description: ResponseCompression configures response compression
for the target.
properties:
enabled:
description: Enabled determines if response compression
is enabled.
type: boolean
format:
description: Format specifies the format to use for
compression (e.g., markdown).
type: string
type: object
selector:
description: |-
`selector` is the label selector used to select `Service` resources.
Expand Down
16 changes: 16 additions & 0 deletions controller/pkg/syncer/backend/translate.go
Original file line number Diff line number Diff line change
Expand Up @@ -295,12 +295,28 @@ func TranslateMCPBackends(ctx plugins.PolicyCtx, be *agentgateway.AgentgatewayBa
mcpTarget.Protocol = api.MCPTarget_STREAMABLE_HTTP
}

if rc := target.ResponseCompression; rc != nil && rc.Enabled {
mcpTarget.ResponseCompression = &api.MCPTarget_ResponseCompression{
Enabled: rc.Enabled,
Format: rc.Format,
}
}

mcpTargets = append(mcpTargets, mcpTarget)
} else if s := target.Selector; s != nil {
targets, err := TranslateMCPSelectorTargets(ctx, be.Namespace, target.Selector)
if err != nil {
return nil, err
}
// Apply responseCompression from the target selector to each generated target
if rc := target.ResponseCompression; rc != nil && rc.Enabled {
for _, t := range targets {
t.ResponseCompression = &api.MCPTarget_ResponseCompression{
Enabled: rc.Enabled,
Format: rc.Format,
}
}
}
mcpTargets = append(mcpTargets, targets...)
}
}
Expand Down
1 change: 1 addition & 0 deletions crates/agentgateway/src/http/oidc/tests.rs
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ fn policy_client() -> crate::proxy::httpproxy::PolicyClient {
let proxy = setup_proxy_test("{}").expect("proxy test harness");
crate::proxy::httpproxy::PolicyClient {
inputs: proxy.inputs(),
span_writer: Default::default(),
}
}

Expand Down
3 changes: 2 additions & 1 deletion crates/agentgateway/src/llm/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -819,8 +819,9 @@ impl AIProvider {
if original_format.supports_prompt_guard() {
let http_headers = &parts.headers;
let claims = parts.extensions.get::<Claims>().cloned();
let sw = log.as_ref().map(|l| l.span_writer()).unwrap_or_default();
if let Some(dr) = p
.apply_prompt_guard(backend_info, &mut req, http_headers, claims)
.apply_prompt_guard(backend_info, &mut req, http_headers, claims, sw)
.await
.map_err(|e| {
warn!("failed to call prompt guard webhook: {e}");
Expand Down
5 changes: 5 additions & 0 deletions crates/agentgateway/src/llm/policy/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -369,9 +369,11 @@ impl Policy {
req: &mut dyn RequestType,
http_headers: &HeaderMap,
claims: Option<Claims>,
span_writer: crate::telemetry::log::SpanWriter,
) -> anyhow::Result<Option<Response>> {
let client = PolicyClient {
inputs: backend_info.inputs.clone(),
span_writer,
};
for g in self
.prompt_guard
Expand Down Expand Up @@ -887,6 +889,9 @@ impl Policy {
phase: crate::telemetry::metrics::GuardrailPhase,
action: crate::telemetry::metrics::GuardrailAction,
) {
let _span = client
.span_writer
.start(format!("guardrail:{phase:?}:{action:?}"));
client
.inputs
.metrics
Expand Down
1 change: 1 addition & 0 deletions crates/agentgateway/src/llm/tests.rs
Original file line number Diff line number Diff line change
Expand Up @@ -977,6 +977,7 @@ async fn process_response_routes_streaming_error_to_buffered_path() {

let client = PolicyClient {
inputs: setup_proxy_test("{}").unwrap().pi,
span_writer: Default::default(),
};

let result = bedrock
Expand Down
Loading
Loading