Skip to content

[SPARK-56167][PS] Align astype with pandas 3 default string behavior#54968

Open
ueshin wants to merge 1 commit intoapache:masterfrom
ueshin:issues/SPARK-56167/astype
Open

[SPARK-56167][PS] Align astype with pandas 3 default string behavior#54968
ueshin wants to merge 1 commit intoapache:masterfrom
ueshin:issues/SPARK-56167/astype

Conversation

@ueshin
Copy link
Member

@ueshin ueshin commented Mar 23, 2026

What changes were proposed in this pull request?

This PR updates a few pandas-on-Spark astype paths to match pandas 3 behavior for the default string dtype.

In pandas 3, astype(str) returns the default string dtype and preserves missing values instead of converting them to string literals such as "NaN" or "<NA>". pandas-on-Spark still used the older behavior in a few localized conversion paths, including numeric, null, string, and boolean casts.

This PR makes three small changes in python/pyspark/pandas/data_type_ops/:

  • update the shared string cast helper so astype(str) preserves missing values for pandas 3 string results
  • align boolean-to-string casting with the same pandas 3 behavior, including the nullable metadata on the result field
  • align string-to-bool casting for pandas 3 string-backed data with pandas' current astype(bool) result

Why are the changes needed?

Without this change, several pandas-on-Spark astype tests fail with pandas 3 because some conversion paths still follow the older string-casting behavior.

The failures came from two related mismatches:

  • astype(str) converted missing values into string literals instead of preserving them as missing values
  • some follow-up casts from pandas 3 string-backed data did not match pandas' current behavior

This patch fixes those localized mismatches while keeping the pandas 2 behavior unchanged.

Does this PR introduce any user-facing change?

Yes.

For pandas 3 users, pandas-on-Spark astype(str) now preserves missing values in the affected paths instead of converting them to string literals. This also fixes related behavior for boolean and string-backed casts that depend on pandas 3's default string behavior.

How was this patch tested?

The existing tests should pass.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Codex (GPT-5)

@ueshin
Copy link
Member Author

ueshin commented Mar 23, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants