Skip to content

[SPARK-56160][SQL] Add DataType classes for nanosecond timestamp types#54966

Open
xiaoxuandev wants to merge 1 commit intoapache:masterfrom
xiaoxuandev:add-timesatmp-nano-types
Open

[SPARK-56160][SQL] Add DataType classes for nanosecond timestamp types#54966
xiaoxuandev wants to merge 1 commit intoapache:masterfrom
xiaoxuandev:add-timesatmp-nano-types

Conversation

@xiaoxuandev
Copy link
Contributor

What changes were proposed in this pull request?

This PR adds two new DataType classes for nanosecond-precision timestamps:

  • TimestampNSType (with local timezone semantics)
  • TimestampNTZNSType (without timezone semantics)

Both are singleton types following the same pattern as TimestampNTZType (SPARK-35662). They are stored internally as a Long representing nanoseconds since the Unix epoch, with a default size of 8 bytes. The representable range is approximately 1677-09-21 to 2262-04-11.

This PR also registers the new types in DataTypes.java (Java API) and DataType.scala (type name registry for JSON/DDL parsing).

Why are the changes needed?

Microsecond precision is insufficient for a growing number of workloads:

  • Parquet files written by Pandas/PyArrow default to TIMESTAMP(NANOS)
  • Iceberg V3 adds timestamp_ns / timestamptz_ns types
  • Financial exchange data (NYSE, NASDAQ, CME) uses nanosecond timestamps
  • OpenTelemetry traces use nanosecond timestamps

Without native nanosecond types, Spark either throws AnalysisException on nanosecond Parquet columns or reads them as raw LongType via spark.sql.legacy.parquet.nanosAsLong, losing all timestamp semantics.

This is the first step of native nanosecond timestamp support. Subsequent PRs will add SQL parser keywords, Cast rules, Parquet read/write, and Arrow integration.

Does this PR introduce any user-facing change?

No. The types are defined but not yet wired into the SQL parser or any data source.

How was this patch tested?

Added checkDefaultSize tests in DataTypeSuite for both new types.

Was this patch authored or co-authored using generative AI tooling?

Yes, co-authored with Kiro.

### What changes were proposed in this pull request?

This PR adds two new DataType classes for nanosecond-precision timestamps:

- `TimestampNSType` (with local timezone semantics)
- `TimestampNTZNSType` (without timezone semantics)

Both are singleton types following the same pattern as `TimestampNTZType` (SPARK-35662). They are stored internally as a Long representing nanoseconds since the Unix epoch, with a default size of 8 bytes. The representable range is approximately 1677-09-21 to 2262-04-11.

This PR also registers the new types in `DataTypes.java` (Java API) and `DataType.scala` (type name registry for JSON/DDL parsing).

### Why are the changes needed?

Microsecond precision is insufficient for a growing number of workloads:

- Parquet files written by Pandas/PyArrow default to `TIMESTAMP(NANOS)`
- Iceberg V3 adds `timestamp_ns` / `timestamptz_ns` types
- Financial exchange data (NYSE, NASDAQ, CME) uses nanosecond timestamps
- OpenTelemetry traces use nanosecond timestamps

Without native nanosecond types, Spark either throws `AnalysisException` on nanosecond Parquet columns or reads them as raw `LongType` via `spark.sql.legacy.parquet.nanosAsLong`, losing all timestamp semantics.

This is the first step of native nanosecond timestamp support. Subsequent PRs will add SQL parser keywords, Cast rules, Parquet read/write, and Arrow integration.

### Does this PR introduce _any_ user-facing change?

No. The types are defined but not yet wired into the SQL parser or any data source.

### How was this patch tested?

Added `checkDefaultSize` tests in `DataTypeSuite` for both new types.

### Was this patch authored or co-authored using generative AI tooling?

Yes, co-authored with Kiro.
@xiaoxuandev xiaoxuandev force-pushed the add-timesatmp-nano-types branch from 8e16374 to b295070 Compare March 23, 2026 22:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant