fix(inference): handle null content and propagate reasoning from thinking models#2353
Open
fix(inference): handle null content and propagate reasoning from thinking models#2353
Conversation
e175af7 to
17b8004
Compare
17b8004 to
9df7a6d
Compare
9df7a6d to
201db45
Compare
ae83dc2 to
ceb96cc
Compare
d4c0525 to
538e229
Compare
538e229 to
0c1a7eb
Compare
d3c074a to
c498919
Compare
c498919 to
1dc3f5f
Compare
06cfd6c to
d9b2adf
Compare
…king models Some providers return content=null for thinking models when the token budget is consumed by reasoning. This caused a Pydantic ValidationError in Message() because content must be str | list[ContentItem], not None. Changes: - Add optional `reasoning` field to Message for per-turn reasoning content - RemoteInferenceEngine (Together, Fireworks, OpenAI, OpenRouter, Gemini, GCP/Vertex, Cerebras, DeepSeek, Parasail, RemoteVLLM): default content to empty string when null, extract reasoning from "reasoning" or "reasoning_content" fields - AnthropicInferenceEngine: extract reasoning from "thinking" content blocks, handle interleaved thinking/text blocks correctly - BedrockInferenceEngine: same as Anthropic (thinking content blocks) - SambanovaInferenceEngine: same as RemoteInferenceEngine (OpenAI-compat) - SGLang/local engines: no change (reasoning embedded in text via tags) Null handling: uses explicit `is None` checks throughout (not truthiness) so that empty-string reasoning/content is preserved correctly. Multiple content blocks (Anthropic/Bedrock) are concatenated directly without separators, matching the original behavior. Tests added for all updated engines including null/empty edge cases and interleaved block ordering.
…king models Some providers return content=null for thinking models when the token budget is consumed by reasoning. This caused a Pydantic ValidationError in Message() because content must be str | list[ContentItem], not None. Changes: - Add optional `reasoning_content` field to Message for per-turn reasoning. Named `reasoning_content` to match Qwen3's chat template convention — templates that support this field render it natively inside <think> tags - RemoteInferenceEngine (Together, Fireworks, OpenAI, OpenRouter, Gemini, GCP/Vertex, Cerebras, DeepSeek, Parasail, RemoteVLLM): default content to empty string when null, extract reasoning from "reasoning" or "reasoning_content" response fields - AnthropicInferenceEngine: extract reasoning from "thinking" content blocks, handle interleaved thinking/text blocks correctly - BedrockInferenceEngine: same as Anthropic (thinking content blocks) - SambanovaInferenceEngine: same as RemoteInferenceEngine (OpenAI-compat) - conversation_utils: preserve reasoning_content when reconstructing Messages in remove_excessive_images and truncate_text_in_content_items - SGLang/local engines: no change (reasoning embedded in text via tags) Null handling: uses explicit `is None` checks (not truthiness) so that empty-string reasoning is preserved correctly. Multiple content blocks (Anthropic/Bedrock) are concatenated directly without separators, matching Anthropic's recommended approach. Training: models whose chat templates reference `reasoning_content` (e.g., Qwen3) will tokenize it natively. For models without template support, a fallback to prepend reasoning as <think> tags in content is needed (separate follow-up).
d9b2adf to
d195f63
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
reasoning_contentfield toMessagefor per-turn reasoning/thinking contentcontent: nullin API responses (default to empty string)<think>tag injectionreasoning_contentin conversation utilitiesIn Scope
Message.reasoning_contentreasoning_contentandreasoningkeys for native templates (Qwen3), fall back to<think>tag injection for unsupported templatesinclude_reasoning=Falseby default) — verified Anthropic rejects unknown fieldsOut of Scope (Follow-ups)
thinking_blockswithsignature, Gemini:thoughtSignature, OpenAI:encrypted_content). These need new fields onMessagefollowing LiteLLM's conventions — see LOU-1472, LOU-1473, LOU-1474thinkingparam, Gemini:thinkingConfig, OpenAI: Responses API migration) — see LOU-1472, LOU-1473, LOU-1474Provider Support
reasoning_contentmessage.reasoningormessage.reasoning_contenttype: "thinking"content blocksthinking_blocks+ signature (LOU-1472)<think>tags)Training Pipeline
DefaultProcessor._convert_messages_to_dicts()checks template support via Jinja2 AST parsing (no false positives from comments/strings)reasoning_contentandreasoningkeys — template renders natively<think>tagsContext
Together API returns
content: nullfor thinking models (e.g., Qwen3.5-9B) when the token budget is exhausted by reasoning. This caused_convert_api_output_to_conversationto fail with2 validation errors for Message. Discovered debugging a production evaluation with 100% failure rate on every judge call.Related Issues
Towards LOU-1466
Fixes LOU-1469
Fixes LOU-1470
Test Plan