[SPARK-56125][SQL] Simplify schema calculation for Merge Into Schema Evolution#54934
Open
szehon-ho wants to merge 2 commits intoapache:masterfrom
Open
[SPARK-56125][SQL] Simplify schema calculation for Merge Into Schema Evolution#54934szehon-ho wants to merge 2 commits intoapache:masterfrom
szehon-ho wants to merge 2 commits intoapache:masterfrom
Conversation
szehon-ho
commented
Mar 21, 2026
| * @param root type schema | ||
| * @param path name segments | ||
| */ | ||
| def fieldExistsAtPath( |
Member
Author
There was a problem hiding this comment.
unfortunately this is still needed, but only for the top level unresolved reference case
xiaoxuandev
reviewed
Mar 22, 2026
| private def fieldExistsAtPathInternal( | ||
| dt: DataType, | ||
| parts: Seq[String]): Boolean = { | ||
| def checkAndRecurse( |
Contributor
There was a problem hiding this comment.
This looks correct.
nit: checkAndRecurse seems unnecessary, can we inline the logic? Also, can we consider rewriting the recursion using pattern matching on parts so the base case is handled in one place?
xiaoxuandev
reviewed
Mar 22, 2026
| * @param valueType type of the assignment value at this path (typically source column) | ||
| * @param changes accumulator for [[TableChange]] instances | ||
| * @param fieldPath qualified path segments for nested columns (`element` / `key` / `value` | ||
| * under arrays and mapss) |
xiaoxuandev
reviewed
Mar 22, 2026
| val changes = mutable.LinkedHashSet.empty[TableChange] | ||
| val failIncompatible: () => Nothing = () => | ||
| throw QueryExecutionErrors.failedToMergeIncompatibleSchemasError( | ||
| originalTarget, originalSource, null) |
Contributor
There was a problem hiding this comment.
nit: failIncompatible passes null as the cause, the error only shows full target/source schemas with no hint about which field path actually conflicts. Since fieldPath, keyType, and valueType are already available at the call site, should we include them in the exception? Would make debugging much easier for deeply nested schemas.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Replace 'sourceSchemaForSchemaEvolution' with simply 'pendingChanges'. Also reduced path based comparisons where possible (where have the resolved type from target/source)
Why are the changes needed?
This was suggested by @aokolnychyi after the initial pr was merged. The 'sourceSchemaForSchemaEvolution' is confusing, it is supposed to be a view of the source schema, pruned by the fields actually referred by the MERGE into statement. It is used by the subsequent logic (that compares it with the target table schema) but it is hard to explain.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Run existing tests
Was this patch authored or co-authored using generative AI tooling?
Yes cursor