fix(anthropic): prevent cache_control overload with LiteLLM-style proxies by DoubleWhopperS · Pull Request #13477 · NousResearch/hermes-agent

DoubleWhopperS · 2026-04-21T11:15:57Z

TL;DR

When Hermes uses provider: anthropic + a non-api.anthropic.com base_url (e.g., llm.echo.tech or any self-hosted LiteLLM), every tool-heavy turn deterministically fails with:

HTTP 400 "A maximum of 4 blocks with cache_control may be provided. Found 5"

Root cause: the proxy injects its own cache_control markers server-side before forwarding to Anthropic/Bedrock. Whatever we send from the client stacks on top of what the proxy adds, blowing past the 4-breakpoint limit.

This PR adds two layered defenses:

_strip_all_cache_control in prompt_caching.py — scrubs accumulated markers from session history before reapplying fresh ones each turn (prevents intra-client accumulation across turns).
_cap_cache_control_markers in anthropic_adapter.py, invoked at the end of build_anthropic_kwargs — caps total markers in the outgoing request; strips all of them when _is_third_party_anthropic_endpoint(base_url) is true.

Reproduction

Configure Hermes:

model:
  default: "claude-sonnet-4-6"
  provider: "anthropic"
  base_url: "https://llm.echo.tech"    # or any LiteLLM proxy

Run any multi-turn task that fires ≥4 consecutive tool calls (e.g., a skill that uses terminal repeatedly).
Observe the HTTP 400 above on turn 5–7 depending on how aggressively the proxy injects.

Debug trail (for reviewers who want the receipts)

Narrowing the root cause took three iterations. Each is concrete — the comparison between client-side marker count and the server's "Found N" number is what pinpointed the proxy as the added party:

phase	fix attempted	dump markers	server says	verdict
1	Scrub accumulated markers (`_strip_all_cache_control` on load)	5 → 4	Found 5	still over — client isn't the only source
2	Cap at end of `build_anthropic_kwargs` (max_markers=4)	4	Found 5	proxy is adding 1
3	Cap to 3 as stress test	3	Found 5	proxy is adding 2 on some turns
4	Cap to 0 for third-party proxies only	0	(no error)	✅ — pass through no markers, proxy manages its own

Cost: we lose client-side prompt-prefix caching hints on LiteLLM-style endpoints. That's acceptable — LiteLLM proxies typically implement their own server-side prompt caching anyway, and correctness beats an optimization that deterministically 400s.

Native Anthropic and OpenRouter paths keep the existing 4-marker strategy unchanged.

A cleaner long-term fix may be to teach _anthropic_prompt_cache_policy to return (False, False) for third-party LiteLLM-style endpoints — i.e., don't emit markers at all rather than cap them after the fact. Happy to follow up with that approach if the maintainers prefer. This PR keeps the change minimal and scoped to the one call site that actually ships requests.

Test plan

Existing Hermes deploy on llm.echo.tech: 8-tool-call skill invocation, completed in 24s with zero HTTP 400. Before this patch, the same skill failed on turn 5.
python3 -m py_compile agent/anthropic_adapter.py agent/prompt_caching.py — clean
Unit test to be added if maintainers request — the behavior is a post-transform on kwargs and is trivially testable.

…xies When talking to a third-party Anthropic-compatible endpoint (LiteLLM, self-hosted proxies, etc.) the proxy injects its own cache_control markers before forwarding to Anthropic/Bedrock. Whatever the client sends stacks on top, deterministically tripping the 4-breakpoint limit with HTTP 400 "A maximum of 4 blocks with cache_control may be provided". - prompt_caching.py: _strip_all_cache_control scrubs accumulated markers from session history before reapplying fresh ones each turn. - anthropic_adapter.py: _cap_cache_control_markers is invoked at the end of build_anthropic_kwargs. For third-party endpoints detected via _is_third_party_anthropic_endpoint(base_url), all markers are stripped so the proxy manages caching on its side. Native Anthropic and OpenRouter paths keep the existing 4-marker strategy unchanged.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(anthropic): prevent cache_control overload with LiteLLM-style proxies#13477

fix(anthropic): prevent cache_control overload with LiteLLM-style proxies#13477
DoubleWhopperS wants to merge 1 commit intoNousResearch:mainfrom
DoubleWhopperS:fix/cache-control-third-party-proxies

DoubleWhopperS commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

DoubleWhopperS commented Apr 21, 2026

TL;DR

Reproduction

Debug trail (for reviewers who want the receipts)

Related

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant