fix(google-genai): include cached_content_token_count in streaming responses#3177
Conversation
…sponses In streaming mode, `_convert_google_chunk_to_streaming_chunk` only extracted `thoughts_token_count` but had no check for `cached_content_token_count`. The same field was also missing from the aggregation pass in `_aggregate_streaming_chunks_with_reasoning`. Add the same explicit check pattern used by the non-streaming path and propagate the value through the streaming aggregator so `meta['usage']['cached_content_token_count']` is populated when context caching is active. Fixes deepset-ai#3168
There was a problem hiding this comment.
Pull request overview
Fixes missing cached_content_token_count in streaming responses for GoogleGenAIChatGenerator by extracting it from chunk usage metadata and propagating it into the aggregated ChatMessage usage metadata.
Changes:
- Add
cached_content_token_countextraction in_convert_google_chunk_to_streaming_chunk. - Propagate
cached_content_token_countfrom streaming chunks into_aggregate_streaming_chunks_with_reasoning. - Add unit tests covering both the per-chunk conversion and final aggregation behavior.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
integrations/google_genai/src/haystack_integrations/components/generators/google_genai/chat/utils.py |
Extract and forward cached_content_token_count in streaming chunk conversion and aggregation. |
integrations/google_genai/tests/test_chat_generator_utils.py |
Add tests ensuring cached token count appears in streaming chunk meta.usage and aggregated message meta.usage. |
integrations/google_genai/CHANGELOG.md |
Document the bug fix under [Unreleased]. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if ( | ||
| usage_metadata | ||
| and hasattr(usage_metadata, "cached_content_token_count") | ||
| and usage_metadata.cached_content_token_count | ||
| ): | ||
| usage["cached_content_token_count"] = usage_metadata.cached_content_token_count |
There was a problem hiding this comment.
The condition and usage_metadata.cached_content_token_count treats 0 as absent, so cached_content_token_count=0 would be dropped even if the API explicitly returns it. Use an is not None check (e.g., assign via getattr(..., None) and compare to None) to preserve valid zero values and match the intent of “if available”.
| if ( | |
| usage_metadata | |
| and hasattr(usage_metadata, "cached_content_token_count") | |
| and usage_metadata.cached_content_token_count | |
| ): | |
| usage["cached_content_token_count"] = usage_metadata.cached_content_token_count | |
| cached_content_token_count = getattr(usage_metadata, "cached_content_token_count", None) if usage_metadata else None | |
| if cached_content_token_count is not None: | |
| usage["cached_content_token_count"] = cached_content_token_count |
| mock_chunk.candidates = [mock_candidate] | ||
| mock_chunk.usage_metadata = mock_usage | ||
|
|
||
| chunk = _convert_google_chunk_to_streaming_chunk(mock_chunk, 0, component_info, "gemini-2.5-flash") |
There was a problem hiding this comment.
Call _convert_google_chunk_to_streaming_chunk using named keyword arguments (as done elsewhere in this test module) to keep the test consistent and resilient to parameter reordering.
| chunk = _convert_google_chunk_to_streaming_chunk(mock_chunk, 0, component_info, "gemini-2.5-flash") | |
| chunk = _convert_google_chunk_to_streaming_chunk( | |
| chunk=mock_chunk, | |
| candidate_index=0, | |
| component_info=component_info, | |
| model="gemini-2.5-flash", | |
| ) |
…unction signature
Coverage report (google_genai)Click to see where and how coverage changed
This report was generated by python-coverage-comment-action |
||||||||||||||||||||||||
bogdankostic
left a comment
There was a problem hiding this comment.
Thanks @Aftabbs, looks good overall, just left two minor comments.
There was a problem hiding this comment.
Please revert the changes made here, the Changelog is generated automatically whenever we release a new version.
There was a problem hiding this comment.
Let's update this comment, we don't only extract thinking token usage here.
- Revert CHANGELOG.md (auto-generated, should not be manually edited) - Broaden comment in _aggregate_streaming_chunks_with_reasoning to reflect that both thoughts_token_count and cached_content_token_count are extracted, not only thinking token usage
bogdankostic
left a comment
There was a problem hiding this comment.
Looking good to me, thanks @Aftabbs!
Summary
Fixes #3168
cached_content_token_countfrom the Google GenAI API usage metadata wasbeing populated in non-streaming responses but was completely absent
from streaming responses.
Root cause (two missing pieces):
_convert_google_chunk_to_streaming_chunkonly checked forthoughts_token_countbut had no equivalent check forcached_content_token_count. The usage dict built for each chunktherefore never included the cached token count.
_aggregate_streaming_chunks_with_reasoningforwardedthoughts_token_countfrom the final chunk into the aggregatedmessage, but had no corresponding logic for
cached_content_token_count.Fix:
cached_content_token_countcheck in_convert_google_chunk_to_streaming_chunk, matching the patternalready used by the non-streaming path.
cached_content_token_countthrough_aggregate_streaming_chunks_with_reasoning, mirroring howthoughts_token_countis handled.aggregator.
How to test