Skip to content

fix(google-genai): include cached_content_token_count in streaming responses#3177

Merged
bogdankostic merged 6 commits intodeepset-ai:mainfrom
Aftabbs:fix/cached-content-token-count-streaming
Apr 24, 2026
Merged

fix(google-genai): include cached_content_token_count in streaming responses#3177
bogdankostic merged 6 commits intodeepset-ai:mainfrom
Aftabbs:fix/cached-content-token-count-streaming

Conversation

@Aftabbs
Copy link
Copy Markdown
Contributor

@Aftabbs Aftabbs commented Apr 17, 2026

Summary

Fixes #3168

cached_content_token_count from the Google GenAI API usage metadata was
being populated in non-streaming responses but was completely absent
from streaming responses.

Root cause (two missing pieces):

  1. _convert_google_chunk_to_streaming_chunk only checked for
    thoughts_token_count but had no equivalent check for
    cached_content_token_count. The usage dict built for each chunk
    therefore never included the cached token count.

  2. _aggregate_streaming_chunks_with_reasoning forwarded
    thoughts_token_count from the final chunk into the aggregated
    message, but had no corresponding logic for
    cached_content_token_count.

Fix:

  • Add an explicit cached_content_token_count check in
    _convert_google_chunk_to_streaming_chunk, matching the pattern
    already used by the non-streaming path.
  • Propagate cached_content_token_count through
    _aggregate_streaming_chunks_with_reasoning, mirroring how
    thoughts_token_count is handled.
  • Add two unit tests covering the streaming chunk conversion and the
    aggregator.

How to test

from haystack.dataclasses import StreamingChunk, ComponentInfo
from haystack_integrations.components.generators.google_genai.chat.utils import (
    _convert_google_chunk_to_streaming_chunk,
    _aggregate_streaming_chunks_with_reasoning,
)
from unittest.mock import Mock

# Streaming chunk with cached tokens
mock_usage = Mock()
mock_usage.prompt_token_count = 1000
mock_usage.candidates_token_count = 10
mock_usage.total_token_count = 1010
mock_usage.thoughts_token_count = None
mock_usage.cached_content_token_count = 800
# ... (build mock chunk) ...
assert chunk.meta["usage"]["cached_content_token_count"] == 800

# Aggregated message
result = _aggregate_streaming_chunks_with_reasoning([chunk1, final_chunk])
assert result.meta["usage"]["cached_content_token_count"] == 800

…sponses

In streaming mode, `_convert_google_chunk_to_streaming_chunk` only
extracted `thoughts_token_count` but had no check for
`cached_content_token_count`. The same field was also missing from
the aggregation pass in `_aggregate_streaming_chunks_with_reasoning`.

Add the same explicit check pattern used by the non-streaming path
and propagate the value through the streaming aggregator so
`meta['usage']['cached_content_token_count']` is populated when
context caching is active.

Fixes deepset-ai#3168
@Aftabbs Aftabbs requested a review from a team as a code owner April 17, 2026 14:56
@Aftabbs Aftabbs requested review from julian-risch and removed request for a team April 17, 2026 14:56
@github-actions github-actions Bot added integration:google-genai type:documentation Improvements or additions to documentation labels Apr 17, 2026
@julian-risch julian-risch requested a review from Copilot April 17, 2026 14:58
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes missing cached_content_token_count in streaming responses for GoogleGenAIChatGenerator by extracting it from chunk usage metadata and propagating it into the aggregated ChatMessage usage metadata.

Changes:

  • Add cached_content_token_count extraction in _convert_google_chunk_to_streaming_chunk.
  • Propagate cached_content_token_count from streaming chunks into _aggregate_streaming_chunks_with_reasoning.
  • Add unit tests covering both the per-chunk conversion and final aggregation behavior.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
integrations/google_genai/src/haystack_integrations/components/generators/google_genai/chat/utils.py Extract and forward cached_content_token_count in streaming chunk conversion and aggregation.
integrations/google_genai/tests/test_chat_generator_utils.py Add tests ensuring cached token count appears in streaming chunk meta.usage and aggregated message meta.usage.
integrations/google_genai/CHANGELOG.md Document the bug fix under [Unreleased].

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +629 to +634
if (
usage_metadata
and hasattr(usage_metadata, "cached_content_token_count")
and usage_metadata.cached_content_token_count
):
usage["cached_content_token_count"] = usage_metadata.cached_content_token_count
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition and usage_metadata.cached_content_token_count treats 0 as absent, so cached_content_token_count=0 would be dropped even if the API explicitly returns it. Use an is not None check (e.g., assign via getattr(..., None) and compare to None) to preserve valid zero values and match the intent of “if available”.

Suggested change
if (
usage_metadata
and hasattr(usage_metadata, "cached_content_token_count")
and usage_metadata.cached_content_token_count
):
usage["cached_content_token_count"] = usage_metadata.cached_content_token_count
cached_content_token_count = getattr(usage_metadata, "cached_content_token_count", None) if usage_metadata else None
if cached_content_token_count is not None:
usage["cached_content_token_count"] = cached_content_token_count

Copilot uses AI. Check for mistakes.
mock_chunk.candidates = [mock_candidate]
mock_chunk.usage_metadata = mock_usage

chunk = _convert_google_chunk_to_streaming_chunk(mock_chunk, 0, component_info, "gemini-2.5-flash")
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call _convert_google_chunk_to_streaming_chunk using named keyword arguments (as done elsewhere in this test module) to keep the test consistent and resilient to parameter reordering.

Suggested change
chunk = _convert_google_chunk_to_streaming_chunk(mock_chunk, 0, component_info, "gemini-2.5-flash")
chunk = _convert_google_chunk_to_streaming_chunk(
chunk=mock_chunk,
candidate_index=0,
component_info=component_info,
model="gemini-2.5-flash",
)

Copilot uses AI. Check for mistakes.
@julian-risch julian-risch requested review from bogdankostic and removed request for julian-risch April 20, 2026 07:39
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 20, 2026

Coverage report (google_genai)

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  integrations/google_genai/src/haystack_integrations/components/generators/google_genai/chat
  utils.py 550, 748, 754
Project Total  

This report was generated by python-coverage-comment-action

Copy link
Copy Markdown
Contributor

@bogdankostic bogdankostic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Aftabbs, looks good overall, just left two minor comments.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert the changes made here, the Changelog is generated automatically whenever we release a new version.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's update this comment, we don't only extract thinking token usage here.

- Revert CHANGELOG.md (auto-generated, should not be manually edited)
- Broaden comment in _aggregate_streaming_chunks_with_reasoning to
  reflect that both thoughts_token_count and cached_content_token_count
  are extracted, not only thinking token usage
Copy link
Copy Markdown
Contributor

@bogdankostic bogdankostic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good to me, thanks @Aftabbs!

@bogdankostic bogdankostic merged commit 6bd3f38 into deepset-ai:main Apr 24, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

integration:google-genai type:documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cached_content_token_count is missing from streaming responses in Google GenAI chat generator

3 participants