Skip to content

fix: preserve tool_calls and tool_call_id through message processing#1024

Open
limey wants to merge 2 commits intoBlaizzy:mainfrom
limey:fix/multi-turn-tool-calls
Open

fix: preserve tool_calls and tool_call_id through message processing#1024
limey wants to merge 2 commits intoBlaizzy:mainfrom
limey:fix/multi-turn-tool-calls

Conversation

@limey
Copy link
Copy Markdown

@limey limey commented Apr 14, 2026

Multi-turn tool calling loops because two independent passes strip tool metadata before the tokenizer's Jinja chat template runs, and a third bug causes double-encoded arguments on subsequent tool calls.


Bug 1 — server.py chat_completions_endpoint (~line 1068)

The message loop rebuilds each message as a plain {role, content} dict. For an assistant message with content: null, tool_calls: [...] this produces {"role": "assistant", "content": ""}tool_calls is gone. For a role: tool message, tool_call_id is similarly dropped.

Bug 2 — prompt_utils.py apply_chat_template (~line 713)

All dict messages are routed through _get_role_contentget_message_json, neither of which carry tool_calls or tool_call_id. Even if Bug 1 were fixed in isolation, this pass would strip the fields again.

Bug 3 — server.py: arguments passed as JSON string instead of dict

The OpenAI wire format stores function.arguments as a JSON string. Gemma 4's Jinja chat template expects a native object. When passed a string, the template embeds it verbatim in the model context, so on the next turn the model mirrors it back using <|"|> escapes around the JSON fragments rather than around individual string values. The parser then decodes those fragments literally, producing double-encoded arguments — e.g. JSON.parse gives { '{"date"': '"2026-04-14"}' } instead of { date: '2026-04-14' }.


Fixes

  • server.py (Bug 1): accumulate into a local msg dict, then conditionally attach tool_calls and tool_call_id before appending. Also adds a previously-missing else branch for unrecognised content types.
  • prompt_utils.py (Bug 2): insert a pass-through branch before the _get_role_content branch — any dict with tool_calls or role == "tool" is appended as-is.
  • server.py (Bug 3): parse arguments from JSON string back to dict during tool_calls serialisation. Falls back to the original string if json.loads fails.

All changes are non-breaking (no-tool conversations are unaffected).

Validation

Tested against mlx-community/gemma-4-26b-a4b-it-4bit with a multi-turn agentic workflow requiring three sequential tool calls (get_time_entriesget_holidaysget_bookings). Before: model looped on the first tool call. After: all three tool calls resolved correctly with results accumulated in history.

limey and others added 2 commits April 15, 2026 07:51
Multi-turn tool calling looped because two independent passes stripped
tool-calling metadata before the tokenizer's Jinja chat template ran:

1. server.py chat_completions_endpoint rebuilt each message as a plain
   {role, content} dict, silently dropping tool_calls on assistant
   messages and tool_call_id on tool-result messages.

2. prompt_utils.py apply_chat_template routed all dict messages through
   _get_role_content → get_message_json, neither of which carry
   tool_calls or tool_call_id, so a second stripping pass occurred even
   if server.py were fixed independently.

Fix 1: accumulate into a local `msg` dict, then conditionally attach
tool_calls (serialised via model_dump if needed) and tool_call_id before
appending to processed_messages. Adds a previously-missing else branch
for unrecognised content types.

Fix 2: insert a pass-through branch in the list-processing loop before
the _get_role_content branch — any dict with tool_calls or role=="tool"
is appended as-is, reaching apply_chat_template → tokenizer intact.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…enizer

The OpenAI wire format stores function arguments as a JSON string.
Gemma 4's Jinja chat template expects a native object. When passed a
string, the template embeds it verbatim in the model context, so on the
next turn the model mirrors it back using <|"|> escapes around the JSON
fragments rather than around individual string values. The parser then
decodes those fragments literally, producing double-encoded arguments
(e.g. key '{"date"' with value '"2026-04-14"}').

Parse arguments from JSON string back to dict during tool_calls
serialisation in processed_messages. Falls back to the original string
if json.loads fails, so non-JSON argument strings are not silently lost.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant