Skip to content

[SPARK-56147][SQL] spark-sql cli correctly handles SQL Scripting compound blocks#54946

Open
pan3793 wants to merge 3 commits intoapache:masterfrom
pan3793:SPARK-48338
Open

[SPARK-56147][SQL] spark-sql cli correctly handles SQL Scripting compound blocks#54946
pan3793 wants to merge 3 commits intoapache:masterfrom
pan3793:SPARK-48338

Conversation

@pan3793
Copy link
Member

@pan3793 pan3793 commented Mar 23, 2026

What changes were proposed in this pull request?

The spark-sql cli now correctly handles SQL Scripting compound blocks (e.g., BEGIN...END, IF...END IF, WHILE...DO...END WHILE, CASE...END CASE) by tracking block nesting depth during input processing.

Changes:

  • Added SqlScriptBlockTracker class in the companion object that tracks SQL Scripting block depth by scanning keyword tokens (BEGIN, END, CASE, IF, DO, LOOP, REPEAT) while correctly handling decorative suffixes after END (e.g., END IF, END CASE).
  • Updated splitSemiColon to use SqlScriptBlockTracker so semicolons inside compound blocks are not treated as statement boundaries.
  • Updated the interactive input loop to call sqlScriptingBlockDepth and continue accumulating input when the user is still inside an open scripting block.

Why are the changes needed?

The spark-sql CLI uses semicolons to determine statement boundaries, both for splitting multi-statement input (splitSemiColon) and for deciding when to execute in interactive mode (line ends with ;). SQL Scripting compound blocks use semicolons as internal statement terminators (e.g., BEGIN SELECT 1; SELECT 2; END;), so the CLI incorrectly splits or prematurely executes incomplete blocks.

For example, in interactive mode:

spark-sql> BEGIN
         >   SELECT 1;   <-- CLI fires here with incomplete "BEGIN\n  SELECT 1;"

After this change, the CLI waits until the block is fully closed (END;) before executing.

Does this PR introduce any user-facing change?

Yes. The spark-sql CLI now correctly accepts multi-line SQL Scripting blocks in both interactive mode and file/-e mode without prematurely splitting or executing them.

How was this patch tested?

New UTs are added in CliSuite.scala

With additional playing with spark-sql:

$ build/sbt -Phive,hive-thriftserver clean package
$ SPARK_PREPEND_CLASSES=true bin/spark-sql 
spark-sql (default)> SELECT current_timestamp();
2026-03-23 13:46:37.797724
Time taken: 0.033 seconds, Fetched 1 row(s)
spark-sql (default)> BEGIN
                   >   DECLARE counter INT DEFAULT 1;
                   >   DECLARE total INT DEFAULT 0;
                   > 
                   >   WHILE counter <= 5 DO
                   >     SET total = total + counter;
                   >     SET counter = counter + 1;
                   >   END WHILE;
                   > 
                   >   SELECT total AS sum_of_first_five;
                   > END;
15
Time taken: 0.363 seconds, Fetched 1 row(s)
spark-sql (default)> SELECT current_timestamp();
2026-03-23 13:46:38.303671
Time taken: 0.028 seconds, Fetched 1 row(s)
spark-sql (default)> 

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Sonnet 4.6), OpenCode (MiMo V2 Pro)

@pan3793
Copy link
Member Author

pan3793 commented Mar 23, 2026

cc @srielau @cloud-fan, could you please take a look?

@pan3793 pan3793 changed the title [SPARK-48338][SQL] spark-sql cli correctly handles SQL Scripting compound blocks [SPARK-56147][SQL] spark-sql cli correctly handles SQL Scripting compound blocks Mar 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant