fix(inference): handle null content and propagate reasoning from thinking models by jgreer013 · Pull Request #2353 · oumi-ai/oumi

jgreer013 · 2026-04-06T18:46:28Z

Summary

Add optional reasoning_content field to Message for per-turn reasoning/thinking content
Handle content: null in API responses (default to empty string)
Extract reasoning across all providers
Training pipeline: detect template reasoning support via Jinja2 AST, fallback to <think> tag injection
Strip reasoning keys from API requests (providers like Anthropic reject unknown fields)
Preserve reasoning_content in conversation utilities

In Scope

Response parsing: Extract reasoning from all providers into Message.reasoning_content
Training: Detect template support via Jinja2 AST, pass both reasoning_content and reasoning keys for native templates (Qwen3), fall back to <think> tag injection for unsupported templates
API safety: Reasoning keys stripped from message dicts sent to APIs (include_reasoning=False by default) — verified Anthropic rejects unknown fields
Backward compat: No regressions for non-thinking models across all tested providers

Out of Scope (Follow-ups)

Multi-turn reasoning round-tripping: Providers require opaque signatures/tokens to maintain reasoning state across turns (Anthropic: thinking_blocks with signature, Gemini: thoughtSignature, OpenAI: encrypted_content). These need new fields on Message following LiteLLM's conventions — see LOU-1472, LOU-1473, LOU-1474
Enabling thinking mode: Each provider requires opt-in (Anthropic: thinking param, Gemini: thinkingConfig, OpenAI: Responses API migration) — see LOU-1472, LOU-1473, LOU-1474
OpenAI Responses API: Completely different response format, needs new engine — LOU-1474
Native Gemini API: Thinking only available via native endpoint, not OpenAI-compat — LOU-1473

Provider Support

Engine	Reasoning source	Extracts `reasoning_content`	Multi-turn round-trip
RemoteInferenceEngine (Together, Fireworks, OpenAI, OpenRouter, Gemini, GCP/Vertex, etc.)	`message.reasoning` or `message.reasoning_content`	✅	❌ Follow-up
Anthropic	`type: "thinking"` content blocks	✅	❌ Needs `thinking_blocks` + signature (LOU-1472)
Bedrock	Same as Anthropic	✅	❌ Same as Anthropic
SambaNova	Same as RemoteInferenceEngine	✅	❌ Follow-up
SGLang/local	N/A (reasoning in text via `<think>` tags)	No change	N/A

Training Pipeline

DefaultProcessor._convert_messages_to_dicts() checks template support via Jinja2 AST parsing (no false positives from comments/strings)
Template supports reasoning (e.g., Qwen3): passes both reasoning_content and reasoning keys — template renders natively
Template doesn't support reasoning (e.g., Llama, DeepSeek): folds reasoning into content wrapped in <think> tags

Context

Together API returns content: null for thinking models (e.g., Qwen3.5-9B) when the token budget is exhausted by reasoning. This caused _convert_api_output_to_conversation to fail with 2 validation errors for Message. Discovered debugging a production evaluation with 100% failure rate on every judge call.

Related Issues

Towards LOU-1466
Fixes LOU-1469
Fixes LOU-1470

Test Plan

RemoteInferenceEngine: null content, null reasoning, empty reasoning, reasoning_content fallback, precedence, content+reasoning
Anthropic: thinking blocks, no thinking, null thinking, null text, interleaved blocks, empty content, missing content key, blocks without type key
Bedrock: thinking blocks, no thinking blocks
SambaNova: reasoning field, null content with reasoning
conversation_utils: preserve reasoning in remove_excessive_images and truncate_text_in_content_items, reasoning serialization
DefaultProcessor: reasoning with supporting template, unsupporting template (fallback), no reasoning
Template detection: Jinja2 AST — comment false positive, string literal false positive, attribute access, dict access, no reasoning
Integration tested with real API calls: Together Qwen3.5-9B (sufficient + insufficient tokens), Anthropic Claude Sonnet 4 (with/without thinking), OpenAI GPT-4.1-mini + GPT-5, Gemini 2.5 Flash
Anthropic multi-turn: verified reasoning keys stripped from API requests (previously caused 400 error)

gitar-bot · 2026-04-06T18:46:34Z

Gitar is working

_Gitar

idoudali

LGTM

…king models Some providers return content=null for thinking models when the token budget is consumed by reasoning. This caused a Pydantic ValidationError in Message() because content must be str | list[ContentItem], not None. Changes: - Add optional `reasoning` field to Message for per-turn reasoning content - RemoteInferenceEngine (Together, Fireworks, OpenAI, OpenRouter, Gemini, GCP/Vertex, Cerebras, DeepSeek, Parasail, RemoteVLLM): default content to empty string when null, extract reasoning from "reasoning" or "reasoning_content" fields - AnthropicInferenceEngine: extract reasoning from "thinking" content blocks, handle interleaved thinking/text blocks correctly - BedrockInferenceEngine: same as Anthropic (thinking content blocks) - SambanovaInferenceEngine: same as RemoteInferenceEngine (OpenAI-compat) - SGLang/local engines: no change (reasoning embedded in text via tags) Null handling: uses explicit `is None` checks throughout (not truthiness) so that empty-string reasoning/content is preserved correctly. Multiple content blocks (Anthropic/Bedrock) are concatenated directly without separators, matching the original behavior. Tests added for all updated engines including null/empty edge cases and interleaved block ordering.

…king models Some providers return content=null for thinking models when the token budget is consumed by reasoning. This caused a Pydantic ValidationError in Message() because content must be str | list[ContentItem], not None. Changes: - Add optional `reasoning_content` field to Message for per-turn reasoning. Named `reasoning_content` to match Qwen3's chat template convention — templates that support this field render it natively inside <think> tags - RemoteInferenceEngine (Together, Fireworks, OpenAI, OpenRouter, Gemini, GCP/Vertex, Cerebras, DeepSeek, Parasail, RemoteVLLM): default content to empty string when null, extract reasoning from "reasoning" or "reasoning_content" response fields - AnthropicInferenceEngine: extract reasoning from "thinking" content blocks, handle interleaved thinking/text blocks correctly - BedrockInferenceEngine: same as Anthropic (thinking content blocks) - SambanovaInferenceEngine: same as RemoteInferenceEngine (OpenAI-compat) - conversation_utils: preserve reasoning_content when reconstructing Messages in remove_excessive_images and truncate_text_in_content_items - SGLang/local engines: no change (reasoning embedded in text via tags) Null handling: uses explicit `is None` checks (not truthiness) so that empty-string reasoning is preserved correctly. Multiple content blocks (Anthropic/Bedrock) are concatenated directly without separators, matching Anthropic's recommended approach. Training: models whose chat templates reference `reasoning_content` (e.g., Qwen3) will tokenize it natively. For models without template support, a fallback to prepend reasoning as <think> tags in content is needed (separate follow-up).

jgreer013 force-pushed the jgreer013/fix-null-content-thinking-models branch 4 times, most recently from e175af7 to 17b8004 Compare April 6, 2026 19:36

idoudali approved these changes Apr 6, 2026

View reviewed changes

jgreer013 force-pushed the jgreer013/fix-null-content-thinking-models branch from 17b8004 to 9df7a6d Compare April 6, 2026 20:18