Skip to content

fix: prevent wrong context lengths for OpenRouter models#13459

Closed
StefanIsMe wants to merge 2 commits intoNousResearch:mainfrom
StefanIsMe:fix/compression-model-context-cap
Closed

fix: prevent wrong context lengths for OpenRouter models#13459
StefanIsMe wants to merge 2 commits intoNousResearch:mainfrom
StefanIsMe:fix/compression-model-context-cap

Conversation

@StefanIsMe
Copy link
Copy Markdown
Contributor

Summary

Fixes a bug where OpenRouter models receive incorrect (too low) context lengths, causing premature context compression and LLM performance degradation.

The Error

Before this fix

Step 6 in get_model_context_length() (the provider-aware DEFAULT_CONTEXT_LENGTHS lookup) checked both provider name and model name:

for default_model, length in sorted(DEFAULT_CONTEXT_LENGTHS.items(), ...):
    if default_model in prov_lower or default_model in model_lower:
        return length

This caused two failure modes:

Bug 1 (v2): Unrelated providers match model names. When effective_provider="minimax" but model="hailuo-mini2", "mimo" (in DEFAULT_CONTEXT_LENGTHS under xiaomi/mimo-v2-pro) matched "minimo" (because "mimo" in "minimo"). Return 4K instead of the 1M OpenRouter metadata.

Bug 2 (previous iteration): Broad family keys match model names via OpenRouter. When effective_provider="openrouter" with model="google/gemini-2.0-flash-lite-001", the broad key "gemini" (from "providers/gemini-2.0-flash:experimental") matched against the model name substring "gemini-2.0-flash-lite-001". Return the hardcoded 1M default instead of the OpenRouter live API metadata for google/gemini-2.0-flash-lite-001.

This affected all model families with broad keys — claude, grok, gemini, mimo, etc.

The fix

Step 6 now checks DEFAULT_CONTEXT_LENGTHS keys against the provider name only:

for default_model, length in sorted(DEFAULT_CONTEXT_LENGTHS.items(), ...):
    if default_model in prov_lower:
        return length

The step requires:

  1. An effective_provider to be set (gated by the outer if effective_provider:)
  2. The DEFAULT_CONTEXT_LENGTHS key to appear in the provider name string

Model-name matching is deferred to step 8 (the no-provider fallback), which uses the same sorted substring matching but only runs when no provider is set.

Before (broken) vs After (fixed)

Scenario Before After
minimax provider + hailuo-mini2 model 4K (wrong — "mimo" matched "minimo") 1M (correct — "minimax" matches provider)
openrouter + claude-sonnet-4 model 200K (wrong — "claude" in model name) OpenRouter live metadata (step 7)
openrouter + grok-3 model 128K (wrong — "grok" in model name) OpenRouter live metadata (step 7)
openrouter + gemini-2.0-flash model 1M (wrong — "gemini" in model name) OpenRouter live metadata (step 7)
anthropic provider + any model 200K (correct — provider matches) 200K (correct — unchanged)
grok provider + any model 128K (correct — provider matches) 128K (correct — unchanged)
gemini provider + any model 1M (correct — provider matches) 1M (correct — unchanged)

Expected Outcome

  • Direct providers (minimax, anthropic, grok, gemini, etc.) still get their hardcoded defaults correctly via provider-name matching
  • OpenRouter models now always get OpenRouter's live API metadata from step 7, not stale hardcoded family defaults
  • No provider fallback (step 8) still works for models with no provider set — uses the full sorted substring matching on the model name
  • No platform, network, or Python-version dependencies

Agnosticism efforts

The fix is deliberately minimal and provider-agnostic:

  1. No hardcoded provider names — the fix doesn't special-case "openrouter" or any specific provider. It simply stops the check from matching model names.
  2. No new constants or weights — single line change (default_model in prov_lower instead of default_model in prov_lower or default_model in model_lower).
  3. Preserves all existing behavior — provider-name matching still works exactly as before for direct providers. Only the broken model-name matching in the provider-aware step is removed.
  4. Platform-independent — pure Python string comparison, no OS or network dependencies.
  5. Test-agnostic — 27 passing tests validate the logic without mocking any specific provider.

Test plan

pytest tests/ -v
  • tests/test_model_metadata.py — context length lookup (18 tests)
  • tests/test_tool_executor.py — tool execution (20 tests)

Stefan added 2 commits April 21, 2026 13:32
When the main model has a large context window (e.g., 944K) but the
auxiliary compression model has a smaller context (e.g., 128K), the
threshold was being set to 100% of the aux model's context, leaving
no room for system prompt, compression instructions, and summary output
inside the auxiliary model's window.

Changes:
- Cap threshold at 85% of aux model context (safety margin)
- Two severity levels for the auto-correction:
  - Mismatch ratio <= 2x: silent (logger.info only, no user message)
  - Mismatch ratio > 2x: user-facing warning with actionable config fix
- Renamed 'Auto-lowered' to 'Auto-capped' for accuracy

This eliminates the startup warning spam for users whose compression
model context is reasonably close to the threshold (within 2x), while
still warning when there's a severe mismatch that needs config attention.
The previous fix checked BOTH provider name AND model name in the
provider-aware step (step 6). This caused a regression for OpenRouter
users: broad family keys like 'gemini', 'claude', 'grok' would match
against model names (e.g. 'google/gemini-2.0-flash') and return the
hardcoded family default instead of OpenRouter's live API metadata.

Now step 6 only checks DEFAULT_CONTEXT_LENGTHS keys against the
provider name (not the model name). Model-name matching is deferred
to step 8 (the no-provider fallback), which uses the same sorted
substring matching but only runs when no provider is set.

This ensures:
- Direct providers (minimax, anthropic, etc.) get their hardcoded defaults
- OpenRouter models get OpenRouter's live metadata (step 7)
- No-provider models get model-name matching (step 8)
- No platform, network, or Python-version dependencies
@teknium1
Copy link
Copy Markdown
Contributor

Thanks for the PR, Stefan — appreciate the detailed writeup.

I traced this against current main before acting on it and the premise doesn't hold up, so I'm going to close this one.

On the model_metadata.py change:

The PR description describes fixing a step-6 check that supposedly had default_model in prov_lower or default_model in model_lower. That code isn't on current main — step 6 today is just fetch_model_metadata() (the OpenRouter live-metadata fetch), with no hardcoded-defaults logic running there. The "Before/After" table describes behavior that isn't currently present. E2E on origin/main:

  • minimax + hailuo-mini2 → 128,000 (not 4K)
  • openrouter + anthropic/claude-sonnet-4.6 → 200,000 (from live OR metadata)
  • openrouter + x-ai/grok-3 → 131,072 (from live OR metadata)
  • openrouter + google/gemini-2.0-flash-lite-001 → 1,000,000 (from live OR metadata)

The title says "OpenRouter models" but the change doesn't actually affect the OpenRouter path — when provider="openrouter", effective_provider stays openrouter (via _infer_provider_from_url), which isn't a key in DEFAULT_CONTEXT_LENGTHS, so the new block no-ops for OR users.

What the diff actually does is insert a new lookup tier between models_dev and the OR fetch that returns a family default when a direct native provider name matches (minimax → 204800, kimi → 262144, etc.). That's a narrower improvement than described, and since direct native providers already hit step 5 (models_dev) first for the cases that matter, the real-world gain is small (primarily minimax+hailuo-mini2 128K → 204800, which isn't even clearly correct — hailuo is a different product line).

On the run_agent.py change:

The 85%-of-aux-context cap for the compression threshold is a reasonable idea, but it's unrelated to context-length lookup and belongs in its own PR with the title matching its content.

On process:

Please verify claims against current main before writing a PR description based on a stale branch state. The merge base here is from Apr 20 and main has moved on — cross-check the "Before" column with an actual run on latest main next time.

No action needed on your end. Closing.

— Teknium

@teknium1 teknium1 closed this Apr 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants