Skip to content

[SPARK-53565][SQL] Fix missing grand total row for ROLLUP/CUBE on runtime-empty tables#54938

Draft
xiaoxuandev wants to merge 1 commit intoapache:masterfrom
xiaoxuandev:fix-53565
Draft

[SPARK-53565][SQL] Fix missing grand total row for ROLLUP/CUBE on runtime-empty tables#54938
xiaoxuandev wants to merge 1 commit intoapache:masterfrom
xiaoxuandev:fix-53565

Conversation

@xiaoxuandev
Copy link
Contributor

@xiaoxuandev xiaoxuandev commented Mar 21, 2026

What changes were proposed in this pull request?

Add a new optimizer rule SplitEmptyGroupingSet that splits an Aggregate over Expand containing an empty grouping set into a Union of two branches:

  1. Non-empty branch: Aggregate over Expand with only the non-empty grouping sets.
  2. Grand total branch: a no-group Aggregate that always produces exactly one row.

SQL standard requires that the empty grouping set (grand total) in ROLLUP/CUBE/GROUPING SETS always produces one row, even when the input relation is empty at runtime. A no-group Aggregate guarantees this because SQL defines that an aggregate without GROUP BY on empty input returns one row.

The rule runs in a Once batch before PropagateEmptyRelation. PropagateEmptyRelation is simplified to always propagate empty through Expand nodes, since after the split, Expand never contains an empty grouping set.

Why are the changes needed?

Without this fix, queries like:

    SELECT a, count(*) FROM empty_table GROUP BY a WITH ROLLUP

incorrectly return 0 rows instead of the expected grand total row Row(null, 0). This applies to both compile-time empty relations (WHERE FALSE) and runtime-empty tables (real tables with 0 rows).

Does this PR introduce any user-facing change?

Yes. ROLLUP/CUBE/GROUPING SETS queries on empty tables now correctly produce the grand total row when the empty grouping set is present.

How was this patch tested?

  • Unit tests in SplitEmptyGroupingSetSuite: Union split, no-split without empty set, only-empty-set case, grouping ID preservation.
  • Unit test in PropagateEmptyRelationSuite: Expand on empty child is always eliminated.
  • End-to-end tests in DataFrameAggregateSuite: compile-time empty, runtime-empty tables, HAVING, multi-column ROLLUP/CUBE, NOT NULL columns, multiple aggregate functions, non-empty data coexistence.

Was this patch authored or co-authored using generative AI tooling?

Yes, co-authored with Kiro.

@xiaoxuandev xiaoxuandev marked this pull request as draft March 21, 2026 06:18
@xiaoxuandev xiaoxuandev force-pushed the fix-53565 branch 3 times, most recently from 402c388 to ba512b8 Compare March 21, 2026 23:14
@xiaoxuandev xiaoxuandev changed the title [SPARK-53565][SQL] Fix aggregate function evaluation for empty grouping sets in PropagateEmptyRelation [SPARK-53565][SQL] Fix missing grand total row for ROLLUP/CUBE on runtime-empty tables Mar 21, 2026
…time-empty tables

### What changes were proposed in this pull request?

Add a new optimizer rule `SplitEmptyGroupingSet` that splits an Aggregate over Expand containing an empty grouping set into a Union of two branches:

1. Non-empty branch: Aggregate over Expand with only the non-empty grouping sets.
2. Grand total branch: a no-group Aggregate that always produces exactly one row.

SQL standard requires that the empty grouping set (grand total) in ROLLUP/CUBE/GROUPING SETS always produces one row, even when the input relation is empty at runtime. A no-group Aggregate guarantees this because SQL defines that an aggregate without GROUP BY on empty input returns one row.

The rule runs in a Once batch before PropagateEmptyRelation. PropagateEmptyRelation is simplified to always propagate empty through Expand nodes, since after the split, Expand never contains an empty grouping set.

### Why are the changes needed?

Without this fix, queries like:

    SELECT a, count(*) FROM empty_table GROUP BY a WITH ROLLUP

incorrectly return 0 rows instead of the expected grand total row `Row(null, 0)`. This applies to both compile-time empty relations (WHERE FALSE) and runtime-empty tables (real tables with 0 rows).

### Does this PR introduce _any_ user-facing change?

Yes. ROLLUP/CUBE/GROUPING SETS queries on empty tables now correctly produce the grand total row.

### How was this patch tested?

- Unit tests in SplitEmptyGroupingSetSuite: Union split, no-split without empty set, only-empty-set case, grouping ID preservation.
- Unit test in PropagateEmptyRelationSuite: Expand on empty child is always eliminated.
- End-to-end tests in DataFrameAggregateSuite: compile-time empty, runtime-empty tables, HAVING, multi-column ROLLUP/CUBE, NOT NULL columns, multiple aggregate functions, non-empty data coexistence.

### Was this patch authored or co-authored using generative AI tooling?

Yes, co-authored with Kiro.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant