[Variant] Align cast logic for variant_get to cast kernel for numeric/bool types#9563
[Variant] Align cast logic for variant_get to cast kernel for numeric/bool types#9563klion26 wants to merge 4 commits intoapache:mainfrom
Conversation
klion26
left a comment
There was a problem hiding this comment.
@scovich This is what is in my mind to align cast logic for variant_get with cast kernel, current, only changed int/float/double/bool (no decimal for now).
Currently, this will change some behaviors (details in inline comments). Please help review this when you're in free, thanks.
| Variant::Int64(i) if fits_precision::<11>(i) => Some(f16::from_f32(i as _)), | ||
| _ => None, | ||
| } | ||
| self.as_num::<f16>() |
There was a problem hiding this comment.
This will change the behavior from before. Now 65536 will return Some(inf) instead of None.
| Variant::Int64(i) => num_cast::<i64, T>(i), | ||
| Variant::Float(f) => num_cast::<f32, T>(f), | ||
| Variant::Double(d) => num_cast::<f64, T>(d), | ||
| Variant::Decimal4(d) if d.scale() == 0 => num_cast::<_, T>(d.integer()), |
There was a problem hiding this comment.
from/to_decimal needs to be added in a following commit if the direction is right
|
I did a quick run, and it looks cool! I'll take a closer look tomorrow |
scovich
left a comment
There was a problem hiding this comment.
High level question: Should we consider keeping the lossless as_xxx methods, with new cast_as_xxx methods that are fully aggressive?
Asking because even lossless casting is complex enough that users could easily get it wrong, which could be a reason for the library to provide it.
Alternatively, we might consider expanding CastOptions to differentiate between lossless casts (e.g. 42i8 -> 42u64) vs. lossy casts (also allows e.g. f64::PI -> 3i32) vs. converting casts (also allows e.g. "true" -> true, 0 -> false, "123" -> 123i8)?
| { | ||
| if value { | ||
| // a workaround to cast a primitive to type O, infallible | ||
| num_traits::cast::cast(1) |
There was a problem hiding this comment.
Seems like T::Native already provides ONE and ZERO as constants?
https://docs.rs/arrow/latest/arrow/array/trait.ArrowNativeTypeOp.html
There was a problem hiding this comment.
I guess we could have an impedance mismatch between ArrowNativeType (arrow) and NumCast (num_cast) tho?
There was a problem hiding this comment.
Yes, ArrowNativeTypeOp contains ONE and ZEOR, but Rust type (i32, etc) does not have ZEOR/ONE.
| where | ||
| I: Default + PartialEq, | ||
| { | ||
| value != I::default() |
There was a problem hiding this comment.
I wasn't aware that rust or SQL allows coercing numbers to bool?
But arrow-cast already allows this, I guess?
There was a problem hiding this comment.
Meanwhile, this should probably compare directly against I::ZERO?
(to match the other direction, below)
There was a problem hiding this comment.
Yes, arrorw-cast already allows this, This is the current equivalent conversion for arrow-cast.
arrow-rs/arrow-cast/src/cast/mod.rs
Lines 2534 to 2548 in 88422cb
parquet-variant/src/variant.rs
Outdated
| match self { | ||
| Variant::BooleanTrue => Some(true), | ||
| Variant::BooleanFalse => Some(false), | ||
| Variant::Int8(i) => Some(cast_num_to_bool::<i8>(*i)), |
There was a problem hiding this comment.
Are these type annotations actually necessary? Seems like the compiler would infer them easily?
There was a problem hiding this comment.
Yes, the type annotations can be removed.
parquet-variant/src/variant.rs
Outdated
| Variant::Int64(i) => Some(cast_num_to_bool::<i64>(*i)), | ||
| Variant::Float(f) => Some(cast_num_to_bool::<f32>(*f)), | ||
| Variant::Double(d) => Some(cast_num_to_bool::<f64>(*d)), | ||
| Variant::ShortString(s) => cast_single_string_to_boolean_default(s.0), |
There was a problem hiding this comment.
ShortString provides Deref?
| Variant::ShortString(s) => cast_single_string_to_boolean_default(s.0), | |
| Variant::ShortString(s) => cast_single_string_to_boolean_default(*s), |
There was a problem hiding this comment.
s here is &ShortString, *s will give ShortString, we need to use &**s to convert to &str, adjusted to s.as_ref()
4f7214e to
e0ee913
Compare
klion26
left a comment
There was a problem hiding this comment.
@scovich Thanks for the review. I've addressed some of the comments and replied to the others.
For the function names Variant::as_xx and Variant::cast_as_xxx, I think as_xx is fine, as conversion name guide gives that conversion prefix like as_/to_, and we can do 3.2f64 as i64 in Rust code, but maybe we can add lossy suffix in the name?
Alternatively, we might consider expanding CastOptions to differentiate between lossless casts (e.g. 42i8 -> 42u64) vs. lossy casts (also allows e.g. f64::PI -> 3i32) vs. converting casts (also allows e.g. "true" -> true, 0 -> false, "123" -> 123i8)?
I've tried adding the lossy flag to CastOptions in #9169; the conclusion is to keep it as it is for now.
| where | ||
| I: Default + PartialEq, | ||
| { | ||
| value != I::default() |
There was a problem hiding this comment.
Yes, arrorw-cast already allows this, This is the current equivalent conversion for arrow-cast.
arrow-rs/arrow-cast/src/cast/mod.rs
Lines 2534 to 2548 in 88422cb
| { | ||
| if value { | ||
| // a workaround to cast a primitive to type O, infallible | ||
| num_traits::cast::cast(1) |
There was a problem hiding this comment.
Yes, ArrowNativeTypeOp contains ONE and ZEOR, but Rust type (i32, etc) does not have ZEOR/ONE.
parquet-variant/src/variant.rs
Outdated
| match self { | ||
| Variant::BooleanTrue => Some(true), | ||
| Variant::BooleanFalse => Some(false), | ||
| Variant::Int8(i) => Some(cast_num_to_bool::<i8>(*i)), |
There was a problem hiding this comment.
Yes, the type annotations can be removed.
parquet-variant/src/variant.rs
Outdated
| Variant::Int64(i) => Some(cast_num_to_bool::<i64>(*i)), | ||
| Variant::Float(f) => Some(cast_num_to_bool::<f32>(*f)), | ||
| Variant::Double(d) => Some(cast_num_to_bool::<f64>(*d)), | ||
| Variant::ShortString(s) => cast_single_string_to_boolean_default(s.0), |
There was a problem hiding this comment.
s here is &ShortString, *s will give ShortString, we need to use &**s to convert to &str, adjusted to s.as_ref()
e0ee913 to
a0f711c
Compare
Which issue does this PR close?
variant_getto cast kernel for numeric/bool types #9564 .What changes are included in this PR?
Align tests with cast kernel
Are these changes tested?
Covered by the existing tests
Are there any user-facing changes?
No