Skip to content

feat: aggr stats partial eval#7995

Draft
discord9 wants to merge 6 commits intomainfrom
impl_aggr_stats_partial_eval
Draft

feat: aggr stats partial eval#7995
discord9 wants to merge 6 commits intomainfrom
impl_aggr_stats_partial_eval

Conversation

@discord9
Copy link
Copy Markdown
Contributor

@discord9 discord9 commented Apr 20, 2026

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

What's changed and what's your intention?

mostly test

PR Checklist

Please convert it to a draft if some of the following conditions are not met.

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR requires documentation updates.
  • API changes are backward compatible.
  • Schema or data changes are backward compatible.

discord9 added 6 commits April 8, 2026 16:47
Signed-off-by: discord9 <discord9@163.com>
Signed-off-by: discord9 <discord9@163.com>
Signed-off-by: discord9 <discord9@163.com>
Signed-off-by: discord9 <discord9@163.com>
Signed-off-by: discord9 <discord9@163.com>
Signed-off-by: discord9 <discord9@163.com>
@github-actions github-actions Bot added size/XXL documentation docs-not-required This change does not impact docs. labels Apr 20, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the AggregateStats physical optimizer pass, which optimizes aggregate queries by synthesizing partial states from per-file statistics instead of scanning eligible files. It supports MIN, MAX, and COUNT for append-only tables and includes updates to RegionScanExec to handle file exclusion. A performance improvement opportunity was identified regarding the eager materialization of statistics for all columns, which could be optimized by only processing columns required for the specific aggregate.

Comment on lines +81 to +82
// TODO(ruihang): extract stats only for columns referenced by the supported aggregates
// instead of eagerly materializing every field column for every file.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Eagerly materializing statistics for all field columns in every file can lead to significant overhead in tables with many columns. Consider passing the set of required columns to build_scan_input_stats to avoid unnecessary metadata processing.

@discord9
Copy link
Copy Markdown
Contributor Author

Closed by #8046

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs-not-required This change does not impact docs. documentation size/XXL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant