Skip to content

fix(inference): handle null content and propagate reasoning from thinking models#2353

Open
jgreer013 wants to merge 2 commits intomainfrom
jgreer013/fix-null-content-thinking-models
Open

fix(inference): handle null content and propagate reasoning from thinking models#2353
jgreer013 wants to merge 2 commits intomainfrom
jgreer013/fix-null-content-thinking-models

Conversation

@jgreer013
Copy link
Copy Markdown
Contributor

@jgreer013 jgreer013 commented Apr 6, 2026

Summary

  • Add optional reasoning_content field to Message for per-turn reasoning/thinking content
  • Handle content: null in API responses (default to empty string)
  • Extract reasoning across all providers
  • Training pipeline: detect template reasoning support via Jinja2 AST, fallback to <think> tag injection
  • Strip reasoning keys from API requests (providers like Anthropic reject unknown fields)
  • Preserve reasoning_content in conversation utilities

In Scope

  • Response parsing: Extract reasoning from all providers into Message.reasoning_content
  • Training: Detect template support via Jinja2 AST, pass both reasoning_content and reasoning keys for native templates (Qwen3), fall back to <think> tag injection for unsupported templates
  • API safety: Reasoning keys stripped from message dicts sent to APIs (include_reasoning=False by default) — verified Anthropic rejects unknown fields
  • Backward compat: No regressions for non-thinking models across all tested providers

Out of Scope (Follow-ups)

  • Multi-turn reasoning round-tripping: Providers require opaque signatures/tokens to maintain reasoning state across turns (Anthropic: thinking_blocks with signature, Gemini: thoughtSignature, OpenAI: encrypted_content). These need new fields on Message following LiteLLM's conventions — see LOU-1472, LOU-1473, LOU-1474
  • Enabling thinking mode: Each provider requires opt-in (Anthropic: thinking param, Gemini: thinkingConfig, OpenAI: Responses API migration) — see LOU-1472, LOU-1473, LOU-1474
  • OpenAI Responses API: Completely different response format, needs new engine — LOU-1474
  • Native Gemini API: Thinking only available via native endpoint, not OpenAI-compat — LOU-1473

Provider Support

Engine Reasoning source Extracts reasoning_content Multi-turn round-trip
RemoteInferenceEngine (Together, Fireworks, OpenAI, OpenRouter, Gemini, GCP/Vertex, etc.) message.reasoning or message.reasoning_content ❌ Follow-up
Anthropic type: "thinking" content blocks ❌ Needs thinking_blocks + signature (LOU-1472)
Bedrock Same as Anthropic ❌ Same as Anthropic
SambaNova Same as RemoteInferenceEngine ❌ Follow-up
SGLang/local N/A (reasoning in text via <think> tags) No change N/A

Training Pipeline

  • DefaultProcessor._convert_messages_to_dicts() checks template support via Jinja2 AST parsing (no false positives from comments/strings)
  • Template supports reasoning (e.g., Qwen3): passes both reasoning_content and reasoning keys — template renders natively
  • Template doesn't support reasoning (e.g., Llama, DeepSeek): folds reasoning into content wrapped in <think> tags

Context

Together API returns content: null for thinking models (e.g., Qwen3.5-9B) when the token budget is exhausted by reasoning. This caused _convert_api_output_to_conversation to fail with 2 validation errors for Message. Discovered debugging a production evaluation with 100% failure rate on every judge call.

Related Issues

Towards LOU-1466
Fixes LOU-1469
Fixes LOU-1470

Test Plan

  • RemoteInferenceEngine: null content, null reasoning, empty reasoning, reasoning_content fallback, precedence, content+reasoning
  • Anthropic: thinking blocks, no thinking, null thinking, null text, interleaved blocks, empty content, missing content key, blocks without type key
  • Bedrock: thinking blocks, no thinking blocks
  • SambaNova: reasoning field, null content with reasoning
  • conversation_utils: preserve reasoning in remove_excessive_images and truncate_text_in_content_items, reasoning serialization
  • DefaultProcessor: reasoning with supporting template, unsupporting template (fallback), no reasoning
  • Template detection: Jinja2 AST — comment false positive, string literal false positive, attribute access, dict access, no reasoning
  • Integration tested with real API calls: Together Qwen3.5-9B (sufficient + insufficient tokens), Anthropic Claude Sonnet 4 (with/without thinking), OpenAI GPT-4.1-mini + GPT-5, Gemini 2.5 Flash
  • Anthropic multi-turn: verified reasoning keys stripped from API requests (previously caused 400 error)

@gitar-bot
Copy link
Copy Markdown

gitar-bot bot commented Apr 6, 2026

Gitar is working

Gitar

@jgreer013 jgreer013 force-pushed the jgreer013/fix-null-content-thinking-models branch 4 times, most recently from e175af7 to 17b8004 Compare April 6, 2026 19:36
Copy link
Copy Markdown
Contributor

@idoudali idoudali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jgreer013 jgreer013 force-pushed the jgreer013/fix-null-content-thinking-models branch from 17b8004 to 9df7a6d Compare April 6, 2026 20:18
Comment thread src/oumi/inference/anthropic_inference_engine.py Outdated
@jgreer013 jgreer013 force-pushed the jgreer013/fix-null-content-thinking-models branch from 9df7a6d to 201db45 Compare April 6, 2026 20:23
Comment thread src/oumi/inference/anthropic_inference_engine.py Outdated
@jgreer013 jgreer013 requested a review from oelachqar April 6, 2026 21:34
@jgreer013 jgreer013 force-pushed the jgreer013/fix-null-content-thinking-models branch 2 times, most recently from ae83dc2 to ceb96cc Compare April 6, 2026 21:38
@jgreer013 jgreer013 changed the title fix(inference): handle null content from thinking models in API responses fix(inference): handle null content and propagate reasoning from thinking models Apr 6, 2026
Comment thread src/oumi/inference/bedrock_inference_engine.py
@jgreer013 jgreer013 force-pushed the jgreer013/fix-null-content-thinking-models branch 7 times, most recently from d4c0525 to 538e229 Compare April 6, 2026 23:32
Comment thread src/oumi/utils/conversation_utils.py Outdated
@jgreer013 jgreer013 force-pushed the jgreer013/fix-null-content-thinking-models branch from 538e229 to 0c1a7eb Compare April 6, 2026 23:50
Comment thread src/oumi/core/processors/default_processor.py Outdated
@jgreer013 jgreer013 force-pushed the jgreer013/fix-null-content-thinking-models branch 2 times, most recently from d3c074a to c498919 Compare April 7, 2026 00:20
Comment thread src/oumi/core/processors/default_processor.py
@jgreer013 jgreer013 force-pushed the jgreer013/fix-null-content-thinking-models branch from c498919 to 1dc3f5f Compare April 7, 2026 00:24
Comment thread src/oumi/utils/conversation_utils.py Outdated
@jgreer013 jgreer013 force-pushed the jgreer013/fix-null-content-thinking-models branch 3 times, most recently from 06cfd6c to d9b2adf Compare April 7, 2026 03:25
…king models

Some providers return content=null for thinking models when the token
budget is consumed by reasoning. This caused a Pydantic ValidationError
in Message() because content must be str | list[ContentItem], not None.

Changes:
- Add optional `reasoning` field to Message for per-turn reasoning content
- RemoteInferenceEngine (Together, Fireworks, OpenAI, OpenRouter, Gemini,
  GCP/Vertex, Cerebras, DeepSeek, Parasail, RemoteVLLM): default content
  to empty string when null, extract reasoning from "reasoning" or
  "reasoning_content" fields
- AnthropicInferenceEngine: extract reasoning from "thinking" content
  blocks, handle interleaved thinking/text blocks correctly
- BedrockInferenceEngine: same as Anthropic (thinking content blocks)
- SambanovaInferenceEngine: same as RemoteInferenceEngine (OpenAI-compat)
- SGLang/local engines: no change (reasoning embedded in text via tags)

Null handling: uses explicit `is None` checks throughout (not truthiness)
so that empty-string reasoning/content is preserved correctly.

Multiple content blocks (Anthropic/Bedrock) are concatenated directly
without separators, matching the original behavior.

Tests added for all updated engines including null/empty edge cases
and interleaved block ordering.
…king models

Some providers return content=null for thinking models when the token
budget is consumed by reasoning. This caused a Pydantic ValidationError
in Message() because content must be str | list[ContentItem], not None.

Changes:
- Add optional `reasoning_content` field to Message for per-turn
  reasoning. Named `reasoning_content` to match Qwen3's chat template
  convention — templates that support this field render it natively
  inside <think> tags
- RemoteInferenceEngine (Together, Fireworks, OpenAI, OpenRouter, Gemini,
  GCP/Vertex, Cerebras, DeepSeek, Parasail, RemoteVLLM): default content
  to empty string when null, extract reasoning from "reasoning" or
  "reasoning_content" response fields
- AnthropicInferenceEngine: extract reasoning from "thinking" content
  blocks, handle interleaved thinking/text blocks correctly
- BedrockInferenceEngine: same as Anthropic (thinking content blocks)
- SambanovaInferenceEngine: same as RemoteInferenceEngine (OpenAI-compat)
- conversation_utils: preserve reasoning_content when reconstructing
  Messages in remove_excessive_images and truncate_text_in_content_items
- SGLang/local engines: no change (reasoning embedded in text via tags)

Null handling: uses explicit `is None` checks (not truthiness) so that
empty-string reasoning is preserved correctly.

Multiple content blocks (Anthropic/Bedrock) are concatenated directly
without separators, matching Anthropic's recommended approach.

Training: models whose chat templates reference `reasoning_content`
(e.g., Qwen3) will tokenize it natively. For models without template
support, a fallback to prepend reasoning as <think> tags in content
is needed (separate follow-up).
@jgreer013 jgreer013 force-pushed the jgreer013/fix-null-content-thinking-models branch from d9b2adf to d195f63 Compare April 7, 2026 18:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants