Skip to content

[Variant] Align cast logic for variant_get to cast kernel for numeric/bool types#9563

Open
klion26 wants to merge 4 commits intoapache:mainfrom
klion26:variant_cast_number_and_bool
Open

[Variant] Align cast logic for variant_get to cast kernel for numeric/bool types#9563
klion26 wants to merge 4 commits intoapache:mainfrom
klion26:variant_cast_number_and_bool

Conversation

@klion26
Copy link
Member

@klion26 klion26 commented Mar 16, 2026

Which issue does this PR close?

What changes are included in this PR?

Align tests with cast kernel

Are these changes tested?

Covered by the existing tests

Are there any user-facing changes?

No

@klion26 klion26 changed the title WIP [Variant] Align cast logic for variant_get to cast kernel for numeric/bool types Mar 16, 2026
@github-actions github-actions bot added arrow Changes to the arrow crate parquet-variant parquet-variant* crates labels Mar 16, 2026
Copy link
Member Author

@klion26 klion26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scovich This is what is in my mind to align cast logic for variant_get with cast kernel, current, only changed int/float/double/bool (no decimal for now).

Currently, this will change some behaviors (details in inline comments). Please help review this when you're in free, thanks.

Variant::Int64(i) if fits_precision::<11>(i) => Some(f16::from_f32(i as _)),
_ => None,
}
self.as_num::<f16>()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will change the behavior from before. Now 65536 will return Some(inf) instead of None.

Variant::Int64(i) => num_cast::<i64, T>(i),
Variant::Float(f) => num_cast::<f32, T>(f),
Variant::Double(d) => num_cast::<f64, T>(d),
Variant::Decimal4(d) if d.scale() == 0 => num_cast::<_, T>(d.integer()),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from/to_decimal needs to be added in a following commit if the direction is right

@sdf-jkl
Copy link
Contributor

sdf-jkl commented Mar 18, 2026

I did a quick run, and it looks cool! I'll take a closer look tomorrow

Copy link
Contributor

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High level question: Should we consider keeping the lossless as_xxx methods, with new cast_as_xxx methods that are fully aggressive?

Asking because even lossless casting is complex enough that users could easily get it wrong, which could be a reason for the library to provide it.

Alternatively, we might consider expanding CastOptions to differentiate between lossless casts (e.g. 42i8 -> 42u64) vs. lossy casts (also allows e.g. f64::PI -> 3i32) vs. converting casts (also allows e.g. "true" -> true, 0 -> false, "123" -> 123i8)?

{
if value {
// a workaround to cast a primitive to type O, infallible
num_traits::cast::cast(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like T::Native already provides ONE and ZERO as constants?
https://docs.rs/arrow/latest/arrow/array/trait.ArrowNativeTypeOp.html

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we could have an impedance mismatch between ArrowNativeType (arrow) and NumCast (num_cast) tho?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, ArrowNativeTypeOp contains ONE and ZEOR, but Rust type (i32, etc) does not have ZEOR/ONE.

where
I: Default + PartialEq,
{
value != I::default()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't aware that rust or SQL allows coercing numbers to bool?
But arrow-cast already allows this, I guess?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meanwhile, this should probably compare directly against I::ZERO?
(to match the other direction, below)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, arrorw-cast already allows this, This is the current equivalent conversion for arrow-cast.

fn numeric_to_bool_cast<T>(from: &PrimitiveArray<T>) -> Result<BooleanArray, ArrowError>
where
T: ArrowPrimitiveType + ArrowPrimitiveType,
{
let mut b = BooleanBuilder::with_capacity(from.len());
for i in 0..from.len() {
if from.is_null(i) {
b.append_null();
} else if from.value(i) != T::default_value() {
b.append_value(true);
} else {
b.append_value(false);
}
}

match self {
Variant::BooleanTrue => Some(true),
Variant::BooleanFalse => Some(false),
Variant::Int8(i) => Some(cast_num_to_bool::<i8>(*i)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these type annotations actually necessary? Seems like the compiler would infer them easily?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the type annotations can be removed.

Variant::Int64(i) => Some(cast_num_to_bool::<i64>(*i)),
Variant::Float(f) => Some(cast_num_to_bool::<f32>(*f)),
Variant::Double(d) => Some(cast_num_to_bool::<f64>(*d)),
Variant::ShortString(s) => cast_single_string_to_boolean_default(s.0),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ShortString provides Deref?

Suggested change
Variant::ShortString(s) => cast_single_string_to_boolean_default(s.0),
Variant::ShortString(s) => cast_single_string_to_boolean_default(*s),

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s here is &ShortString, *s will give ShortString, we need to use &**s to convert to &str, adjusted to s.as_ref()

@klion26 klion26 force-pushed the variant_cast_number_and_bool branch from 4f7214e to e0ee913 Compare March 19, 2026 08:48
Copy link
Member Author

@klion26 klion26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scovich Thanks for the review. I've addressed some of the comments and replied to the others.

For the function names Variant::as_xx and Variant::cast_as_xxx, I think as_xx is fine, as conversion name guide gives that conversion prefix like as_/to_, and we can do 3.2f64 as i64 in Rust code, but maybe we can add lossy suffix in the name?

Alternatively, we might consider expanding CastOptions to differentiate between lossless casts (e.g. 42i8 -> 42u64) vs. lossy casts (also allows e.g. f64::PI -> 3i32) vs. converting casts (also allows e.g. "true" -> true, 0 -> false, "123" -> 123i8)?

I've tried adding the lossy flag to CastOptions in #9169; the conclusion is to keep it as it is for now.

where
I: Default + PartialEq,
{
value != I::default()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, arrorw-cast already allows this, This is the current equivalent conversion for arrow-cast.

fn numeric_to_bool_cast<T>(from: &PrimitiveArray<T>) -> Result<BooleanArray, ArrowError>
where
T: ArrowPrimitiveType + ArrowPrimitiveType,
{
let mut b = BooleanBuilder::with_capacity(from.len());
for i in 0..from.len() {
if from.is_null(i) {
b.append_null();
} else if from.value(i) != T::default_value() {
b.append_value(true);
} else {
b.append_value(false);
}
}

{
if value {
// a workaround to cast a primitive to type O, infallible
num_traits::cast::cast(1)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, ArrowNativeTypeOp contains ONE and ZEOR, but Rust type (i32, etc) does not have ZEOR/ONE.

match self {
Variant::BooleanTrue => Some(true),
Variant::BooleanFalse => Some(false),
Variant::Int8(i) => Some(cast_num_to_bool::<i8>(*i)),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the type annotations can be removed.

Variant::Int64(i) => Some(cast_num_to_bool::<i64>(*i)),
Variant::Float(f) => Some(cast_num_to_bool::<f32>(*f)),
Variant::Double(d) => Some(cast_num_to_bool::<f64>(*d)),
Variant::ShortString(s) => cast_single_string_to_boolean_default(s.0),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s here is &ShortString, *s will give ShortString, we need to use &**s to convert to &str, adjusted to s.as_ref()

@klion26 klion26 force-pushed the variant_cast_number_and_bool branch from e0ee913 to a0f711c Compare March 19, 2026 09:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate parquet-variant parquet-variant* crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Variant] Align cast logic for variant_get to cast kernel for numeric/bool types

3 participants