Add tool aware collator to mask tool response correctly by shanghongsim · Pull Request #2356 · oumi-ai/oumi

shanghongsim · 2026-04-08T23:01:23Z

Description

Currently, the completions only collator does not mask tool responses (Role.TOOL) correctly.

Without instruction_template, tool results are masked correctly. But with instruction_template, things get weird and some tool results are unmasked. Omitting instruction_template causes the collator to find the last response_template and masks everything before it (only the final assistant turn trains). With instruction_template, the collator searches for instruction_template and response_template pairs and masks everything in between.

Case 1: without instruction_template

RESPONSE_TEMPLATE = "<|im_start|>assistant\n"

old_no_inst = DataCollatorForCompletionOnlyLM(
    response_template=RESPONSE_TEMPLATE,
    instruction_template=None,
    tokenizer=tokenizer,
)
b = old_no_inst.torch_call([token_ids])
labels_A = b["labels"][0].tolist()
summarise("Case A", labels_A, N)
show_masking(b["input_ids"][0].tolist(), labels_A, tokenizer)

Without inst template, collator masks everything before the last resp template. So loss only sees

<|im_start|>assistant
The weather in Paris is sunny and 18°C.<|im_end|>

Case 2: with instruction_template

With instruction_template, it masks everything between a inst and resp template. Since there isn't an inst template before the tool result, it does not get masked properly.

INSTRUCTION_TEMPLATE = "<|im_start|>user\n"

old_with_inst = DataCollatorForCompletionOnlyLM(
    response_template=RESPONSE_TEMPLATE,
    instruction_template=INSTRUCTION_TEMPLATE,
    tokenizer=tokenizer,
)
b2 = old_with_inst.torch_call([token_ids])
labels_B = b2["labels"][0].tolist()
summarise("Case B", labels_B, N)
show_masking(b2["input_ids"][0].tolist(), labels_B, tokenizer)

In case 2, the loss sees tool result, which is incorrect.

...
<|im_start|>assistant
<tool_call>
{"name": "get_weather", "arguments": {"location": "Paris"}}
</tool_call><|im_end|>
...
<|im_start|>assistant
The weather in Paris is sunny and 18°C.<|im_end|>

To confirm my hypothesis of no inst template before the tool result being the cause, I experimented with adding a user turn before the tool call and masking works correctly in that case.

ToolAwareCompletionsCollator

With the new ToolAwareCompletionsCollator, tool results are masked properly

Without instruction_template

With instruction_template

Related issues

Fixes # (issue)

Before submitting

This PR only changes documentation. (You can ignore the following checks in that case)
Did you read the contributor guideline Pull Request guidelines?
Did you link the issue(s) related to this PR in the section above?
Did you add / update tests where needed?

Reviewers

At least one review from a member of oumi-ai/oumi-staff is required.

Adds a new collator that masks labels for non-tool-call assistant completions, enabling models to learn tool-calling behavior specifically.

Register ToolAwareCompletionsCollator in the collator builder and add the corresponding CollatorType enum. Update collator docstrings and tests.

Replace hardcoded IGNORE = -100 with the shared constant from oumi.core.constants to be consistent with sibling test files.

gitar-bot · 2026-04-09T00:13:25Z

Gitar is working

_Gitar

The builder was not forwarding label_ignore_index to the collator constructor, so it always used the default -100 instead of the configured value. Add the missing parameter and a builder test.

jgreer013 · 2026-04-09T00:20:33Z


 import oumi.core.constants as constants
 from oumi.core.collators.text_collator_with_padding import TextCollatorWithPadding
 from oumi.core.collators.text_completions_collator_with_padding import (


It seems like it inherits padding support from the base class, should we add "WithPadding" to its name? Or are we moving away from that naming?

Just to clarify this class actually uses composition rather than inheritance (it wraps a transformers DataCollatorForCompletionOnlyLM instance internally). The padding behavior comes from that inner collator's chain. So the "WithPadding" in the name is describing what the collator does, not something it inherits. That said, I agree the naming could be better. @oelachqar was suggesting a refactor of all collators into a universal one with options (padding=True, completion_only=True) actually. Exploring that now to see if that can be done in this PR

jgreer013 · 2026-04-09T00:21:49Z

+        mask_tool_calls = kwargs.pop("mask_tool_calls", False)
+        tool_call_start_template = kwargs.pop("tool_call_start_template", None)
+
+        if not response_template:


If I remember correctly we made a collator that can identify this itself?

Based on my understanding and research, there isn't a collator in the codebase that auto-identifies the response template. They all need it provided explicitly or fall back to Llama defaults.

oumi/src/oumi/builders/collators.py

Line 133 in 81611e4

# Default to Llama-style templates if not provided

Extend the existing collator with optional end_of_turn_template, mask_tool_calls, and tool_call_start_template parameters for span-based label masking in tool-calling conversations. Remove the separate ToolAwareCompletionsCollator class and forward label_ignore_index through TextCompletionsCollatorWithPadding.

sentry · 2026-04-09T22:18:23Z

+                        seq,
+                        content_start,
+                        content_end,
+                        self.tool_call_start_token_ids,
+                    ):
+                        continue
+
+                # Step 5: unmask this assistant response span.
+                batch["labels"][i, content_start:content_end] = batch["input_ids"][
+                    i, content_start:content_end
+                ]


Bug: The _apply_span_masking method incorrectly excludes EOT tokens from training labels, preventing the model from learning to emit stopping signals.
_{Severity: HIGH}

Suggested Fix

To include the EOT tokens in the training labels, the content_end calculation in _apply_span_masking should be adjusted. It should be set to content_start + eot_positions[0] + len(eot_ids) to ensure the slice includes the full EOT token sequence in the unmasked labels.

Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: src/oumi/core/collators/trl_data_collator_for_completion_only_lm.py#L184-L205 Potential issue: When using `end_of_turn_template` for span-based masking, the `_apply_span_masking` method incorrectly excludes end-of-turn (EOT) tokens from the training labels. The slice `batch["labels"][i, content_start:content_end]` is calculated such that `content_end` marks the beginning of the EOT token sequence, effectively masking it. Standard supervised fine-tuning requires including these tokens in the loss so the model learns to generate them as stopping signals. Models trained with this collator will not learn when to stop generating, leading to uncontrolled output during inference. This behavior appears to be an unintended side effect of a change focused on masking tool responses.

Should not be an issue. Trl's collator also mask out end-of-turn (EOT) tokens.

shanghongsim · 2026-04-14T22:56:19Z

This is v1.

v2 with improved interface is in these PRs:

#2368

#2369

shanghongsim added 3 commits April 8, 2026 22:23

Add tool-aware completions collator for tool call label masking

6b893dd

Adds a new collator that masks labels for non-tool-call assistant completions, enabling models to learn tool-calling behavior specifically.

Integrate tool-aware collator with builder pipeline

8df9efd

Register ToolAwareCompletionsCollator in the collator builder and add the corresponding CollatorType enum. Update collator docstrings and tests.

Use constants.LABEL_IGNORE_INDEX in tool-aware collator tests

d3d2213

Replace hardcoded IGNORE = -100 with the shared constant from oumi.core.constants to be consistent with sibling test files.

shanghongsim marked this pull request as ready for review April 9, 2026 00:12

shanghongsim requested review from jgreer013 and oelachqar April 9, 2026 00:12

sentry bot reviewed Apr 9, 2026

View reviewed changes

Comment thread src/oumi/builders/collators.py Outdated

Pass label_ignore_index to ToolAwareCompletionsCollator in builder

188c02b

The builder was not forwarding label_ignore_index to the collator constructor, so it always used the default -100 instead of the configured value. Add the missing parameter and a builder test.

jgreer013 approved these changes Apr 9, 2026

View reviewed changes

sentry bot reviewed Apr 9, 2026

View reviewed changes

shanghongsim closed this Apr 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tool aware collator to mask tool response correctly#2356

Add tool aware collator to mask tool response correctly#2356
shanghongsim wants to merge 5 commits intomainfrom
shanghong/tool-aware-collator

shanghongsim commented Apr 8, 2026 •

edited

Loading

Uh oh!

gitar-bot bot commented Apr 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

jgreer013 Apr 9, 2026

Uh oh!

shanghongsim Apr 9, 2026

Uh oh!

jgreer013 Apr 9, 2026

Uh oh!

shanghongsim Apr 9, 2026

Uh oh!

sentry bot Apr 9, 2026

Uh oh!

shanghongsim Apr 9, 2026

Uh oh!

shanghongsim commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

shanghongsim commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

ToolAwareCompletionsCollator

Related issues

Before submitting

Reviewers

Uh oh!

gitar-bot bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

jgreer013 Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

shanghongsim Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

jgreer013 Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

shanghongsim Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

sentry bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

shanghongsim Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

shanghongsim commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shanghongsim commented Apr 8, 2026 •

edited

Loading

gitar-bot bot commented Apr 9, 2026 •

edited

Loading