fix(google-genai): include cached_content_token_count in streaming responses by Aftabbs · Pull Request #3177 · deepset-ai/haystack-core-integrations

Aftabbs · 2026-04-17T14:56:48Z

Summary

cached_content_token_count from the Google GenAI API usage metadata was
being populated in non-streaming responses but was completely absent
from streaming responses.

Root cause (two missing pieces):

_convert_google_chunk_to_streaming_chunk only checked for
thoughts_token_count but had no equivalent check for
cached_content_token_count. The usage dict built for each chunk
therefore never included the cached token count.
_aggregate_streaming_chunks_with_reasoning forwarded
thoughts_token_count from the final chunk into the aggregated
message, but had no corresponding logic for
cached_content_token_count.

Fix:

Add an explicit cached_content_token_count check in
_convert_google_chunk_to_streaming_chunk, matching the pattern
already used by the non-streaming path.
Propagate cached_content_token_count through
_aggregate_streaming_chunks_with_reasoning, mirroring how
thoughts_token_count is handled.
Add two unit tests covering the streaming chunk conversion and the
aggregator.

How to test

from haystack.dataclasses import StreamingChunk, ComponentInfo
from haystack_integrations.components.generators.google_genai.chat.utils import (
    _convert_google_chunk_to_streaming_chunk,
    _aggregate_streaming_chunks_with_reasoning,
)
from unittest.mock import Mock

# Streaming chunk with cached tokens
mock_usage = Mock()
mock_usage.prompt_token_count = 1000
mock_usage.candidates_token_count = 10
mock_usage.total_token_count = 1010
mock_usage.thoughts_token_count = None
mock_usage.cached_content_token_count = 800
# ... (build mock chunk) ...
assert chunk.meta["usage"]["cached_content_token_count"] == 800

# Aggregated message
result = _aggregate_streaming_chunks_with_reasoning([chunk1, final_chunk])
assert result.meta["usage"]["cached_content_token_count"] == 800

…sponses In streaming mode, `_convert_google_chunk_to_streaming_chunk` only extracted `thoughts_token_count` but had no check for `cached_content_token_count`. The same field was also missing from the aggregation pass in `_aggregate_streaming_chunks_with_reasoning`. Add the same explicit check pattern used by the non-streaming path and propagate the value through the streaming aggregator so `meta['usage']['cached_content_token_count']` is populated when context caching is active. Fixes deepset-ai#3168

Copilot

Pull request overview

Fixes missing cached_content_token_count in streaming responses for GoogleGenAIChatGenerator by extracting it from chunk usage metadata and propagating it into the aggregated ChatMessage usage metadata.

Changes:

Add cached_content_token_count extraction in _convert_google_chunk_to_streaming_chunk.
Propagate cached_content_token_count from streaming chunks into _aggregate_streaming_chunks_with_reasoning.
Add unit tests covering both the per-chunk conversion and final aggregation behavior.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
`integrations/google_genai/src/haystack_integrations/components/generators/google_genai/chat/utils.py`	Extract and forward `cached_content_token_count` in streaming chunk conversion and aggregation.
`integrations/google_genai/tests/test_chat_generator_utils.py`	Add tests ensuring cached token count appears in streaming chunk `meta.usage` and aggregated message `meta.usage`.
`integrations/google_genai/CHANGELOG.md`	Document the bug fix under `[Unreleased]`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-17T15:00:59Z

+    if (
+        usage_metadata
+        and hasattr(usage_metadata, "cached_content_token_count")
+        and usage_metadata.cached_content_token_count
+    ):
+        usage["cached_content_token_count"] = usage_metadata.cached_content_token_count


The condition and usage_metadata.cached_content_token_count treats 0 as absent, so cached_content_token_count=0 would be dropped even if the API explicitly returns it. Use an is not None check (e.g., assign via getattr(..., None) and compare to None) to preserve valid zero values and match the intent of “if available”.

Suggested change

if (

usage_metadata

and hasattr(usage_metadata, "cached_content_token_count")

and usage_metadata.cached_content_token_count

):

usage["cached_content_token_count"] = usage_metadata.cached_content_token_count

cached_content_token_count = getattr(usage_metadata, "cached_content_token_count", None) if usage_metadata else None

if cached_content_token_count is not None:

usage["cached_content_token_count"] = cached_content_token_count

Copilot · 2026-04-17T15:01:00Z

+        mock_chunk.candidates = [mock_candidate]
+        mock_chunk.usage_metadata = mock_usage
+
+        chunk = _convert_google_chunk_to_streaming_chunk(mock_chunk, 0, component_info, "gemini-2.5-flash")


Call _convert_google_chunk_to_streaming_chunk using named keyword arguments (as done elsewhere in this test module) to keep the test consistent and resilient to parameter reordering.

Suggested change

chunk = _convert_google_chunk_to_streaming_chunk(mock_chunk, 0, component_info, "gemini-2.5-flash")

chunk = _convert_google_chunk_to_streaming_chunk(

chunk=mock_chunk,

candidate_index=0,

component_info=component_info,

model="gemini-2.5-flash",

)

…ck; use named kwargs in test

…unction signature

github-actions · 2026-04-20T08:47:38Z

Coverage report (google_genai)

Click to see where and how coverage changed

File	Statements	Missing	Coverage	Coverage (new stmts)	Lines missing
integrations/google_genai/src/haystack_integrations/components/generators/google_genai/chat
utils.py					550, 748, 754
Project Total

_{This report was generated by python-coverage-comment-action}

bogdankostic

Thanks @Aftabbs, looks good overall, just left two minor comments.

bogdankostic · 2026-04-20T15:32:12Z

Please revert the changes made here, the Changelog is generated automatically whenever we release a new version.

bogdankostic · 2026-04-20T15:44:12Z

Let's update this comment, we don't only extract thinking token usage here.

- Revert CHANGELOG.md (auto-generated, should not be manually edited) - Broaden comment in _aggregate_streaming_chunks_with_reasoning to reflect that both thoughts_token_count and cached_content_token_count are extracted, not only thinking token usage

bogdankostic

Looking good to me, thanks @Aftabbs!

Aftabbs requested a review from a team as a code owner April 17, 2026 14:56

Aftabbs requested review from julian-risch and removed request for a team April 17, 2026 14:56

github-actions Bot added integration:google-genai type:documentation Improvements or additions to documentation labels Apr 17, 2026

julian-risch requested a review from Copilot April 17, 2026 14:58

Copilot started reviewing on behalf of julian-risch April 17, 2026 14:59 View session

Copilot AI reviewed Apr 17, 2026

View reviewed changes

Aftabbs added 2 commits April 19, 2026 21:11

style(google-genai): fix ruff formatting in test_chat_generator_utils

6e50bdc

fix(google-genai): use is not None for cached_content_token_count che…

4824395

…ck; use named kwargs in test

julian-risch requested review from bogdankostic and removed request for julian-risch April 20, 2026 07:39

fix(google-genai): rename candidate_index to index in test to match f…

38d85ee

…unction signature

Merge branch 'main' into fix/cached-content-token-count-streaming

efeb348

bogdankostic requested changes Apr 20, 2026

View reviewed changes

bogdankostic approved these changes Apr 24, 2026

View reviewed changes

bogdankostic merged commit 6bd3f38 into deepset-ai:main Apr 24, 2026
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(google-genai): include cached_content_token_count in streaming responses#3177

fix(google-genai): include cached_content_token_count in streaming responses#3177
bogdankostic merged 6 commits intodeepset-ai:mainfrom
Aftabbs:fix/cached-content-token-count-streaming

Aftabbs commented Apr 17, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 17, 2026

Uh oh!

Copilot AI Apr 17, 2026

Uh oh!

github-actions Bot commented Apr 20, 2026 •

edited

Loading

Uh oh!

bogdankostic left a comment

Uh oh!

bogdankostic Apr 20, 2026

Uh oh!

bogdankostic Apr 20, 2026

Uh oh!

bogdankostic left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-        chunk = _convert_google_chunk_to_streaming_chunk(mock_chunk, 0, component_info, "gemini-2.5-flash")
+        chunk = _convert_google_chunk_to_streaming_chunk(
+            chunk=mock_chunk,
+            candidate_index=0,
+            component_info=component_info,
+            model="gemini-2.5-flash",
+        )

Conversation

Aftabbs commented Apr 17, 2026

Summary

How to test

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage report (google_genai)

Uh oh!

bogdankostic left a comment

Choose a reason for hiding this comment

Uh oh!

bogdankostic Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

bogdankostic Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

bogdankostic left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions Bot commented Apr 20, 2026 •

edited

Loading