fix: preserve tool_calls and tool_call_id through message processing by limey · Pull Request #1024 · Blaizzy/mlx-vlm

limey · 2026-04-14T19:52:02Z

Multi-turn tool calling loops because two independent passes strip tool metadata before the tokenizer's Jinja chat template runs, and a third bug causes double-encoded arguments on subsequent tool calls.

Bug 1 — server.py chat_completions_endpoint (~line 1068)

The message loop rebuilds each message as a plain {role, content} dict. For an assistant message with content: null, tool_calls: [...] this produces {"role": "assistant", "content": ""} — tool_calls is gone. For a role: tool message, tool_call_id is similarly dropped.

Bug 2 — prompt_utils.py apply_chat_template (~line 713)

All dict messages are routed through _get_role_content → get_message_json, neither of which carry tool_calls or tool_call_id. Even if Bug 1 were fixed in isolation, this pass would strip the fields again.

Bug 3 — server.py: arguments passed as JSON string instead of dict

The OpenAI wire format stores function.arguments as a JSON string. Gemma 4's Jinja chat template expects a native object. When passed a string, the template embeds it verbatim in the model context, so on the next turn the model mirrors it back using <|"|> escapes around the JSON fragments rather than around individual string values. The parser then decodes those fragments literally, producing double-encoded arguments — e.g. JSON.parse gives { '{"date"': '"2026-04-14"}' } instead of { date: '2026-04-14' }.

Fixes

server.py (Bug 1): accumulate into a local msg dict, then conditionally attach tool_calls and tool_call_id before appending. Also adds a previously-missing else branch for unrecognised content types.
prompt_utils.py (Bug 2): insert a pass-through branch before the _get_role_content branch — any dict with tool_calls or role == "tool" is appended as-is.
server.py (Bug 3): parse arguments from JSON string back to dict during tool_calls serialisation. Falls back to the original string if json.loads fails.

All changes are non-breaking (no-tool conversations are unaffected).

Validation

Tested against mlx-community/gemma-4-26b-a4b-it-4bit with a multi-turn agentic workflow requiring three sequential tool calls (get_time_entries → get_holidays → get_bookings). Before: model looped on the first tool call. After: all three tool calls resolved correctly with results accumulated in history.

Multi-turn tool calling looped because two independent passes stripped tool-calling metadata before the tokenizer's Jinja chat template ran: 1. server.py chat_completions_endpoint rebuilt each message as a plain {role, content} dict, silently dropping tool_calls on assistant messages and tool_call_id on tool-result messages. 2. prompt_utils.py apply_chat_template routed all dict messages through _get_role_content → get_message_json, neither of which carry tool_calls or tool_call_id, so a second stripping pass occurred even if server.py were fixed independently. Fix 1: accumulate into a local `msg` dict, then conditionally attach tool_calls (serialised via model_dump if needed) and tool_call_id before appending to processed_messages. Adds a previously-missing else branch for unrecognised content types. Fix 2: insert a pass-through branch in the list-processing loop before the _get_role_content branch — any dict with tool_calls or role=="tool" is appended as-is, reaching apply_chat_template → tokenizer intact. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…enizer The OpenAI wire format stores function arguments as a JSON string. Gemma 4's Jinja chat template expects a native object. When passed a string, the template embeds it verbatim in the model context, so on the next turn the model mirrors it back using <|"|> escapes around the JSON fragments rather than around individual string values. The parser then decodes those fragments literally, producing double-encoded arguments (e.g. key '{"date"' with value '"2026-04-14"}'). Parse arguments from JSON string back to dict during tool_calls serialisation in processed_messages. Falls back to the original string if json.loads fails, so non-JSON argument strings are not silently lost. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

limey and others added 2 commits April 15, 2026 07:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: preserve tool_calls and tool_call_id through message processing#1024

fix: preserve tool_calls and tool_call_id through message processing#1024
limey wants to merge 2 commits intoBlaizzy:mainfrom
limey:fix/multi-turn-tool-calls

limey commented Apr 14, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

limey commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

limey commented Apr 14, 2026 •

edited

Loading