fix: trim before parsing numbers by aryan-212 · Pull Request #9537 · apache/arrow-rs

aryan-212 · 2026-03-11T14:20:17Z

Which issue does this PR close?

Closes arrow-cast numeric parsers fail to parse whitespace-padded strings #9538

Rationale for this change

The Parser::parse implementations for numeric types did not trim whitespace before parsing. This caused values like " 42 " or " 1.5 " to fail parsing and return None, even though they represent valid numbers.

What changes are included in this PR?

Added .trim() calls before parsing in FloatType Parser implementations.
Added string.trim() at the top of the parser_primitive! macro, which covers all integer and duration types.

Are these changes tested?

Yes. Added test_parse_trimmed_whitespace covering:

Float types with leading/trailing spaces and tabs/newlines
Signed and unsigned integer types with whitespace
Negative integers with whitespace
Whitespace-only strings returning None

Datafusion changes
For the following SQL :-

 SELECT
    substring('Suite 28', 6) AS extracted,
    length(substring('Suite 28', 6)) AS extracted_length,
    CAST(substring('Suite 28', 6) AS INT) AS extracted_int,
    CAST(substring('Suite 28', 6) AS INT) + 1 AS plus_one;

in datafusion we used to get

extracted	extracted_length	extracted_int	plus_one
28	3	null	null

now after these changes, we get

extracted	extracted_length	extracted_int	plus_one
28	3	28	29

this behaviour is now aligned with Databricks

Are there any user-facing changes?

Yes. Numeric parsing now accepts strings with leading/trailing whitespace. This is a relaxation of the previous behaviour (previously None, now Some(value)), so it is not a breaking change.

tustvold · 2026-03-13T09:59:50Z

Have you run the benchmarks for this?

aryan-212 · 2026-03-13T12:22:05Z

Have you run the benchmarks for this?

sorry, new here, could you tell me how do I run them? 😅

Rafferty97 · 2026-03-13T14:03:29Z

Have you run the benchmarks for this?

sorry, new here, could you tell me how do I run them? 😅

I think "cargo bench -p arrow-cast" should be sufficient.

alamb · 2026-03-17T19:12:43Z

run benchmark cast_kernels

alamb · 2026-03-17T19:13:28Z

Thank you @Rafferty97 and @aryan-212 -- I kicked off some benchmark runs to verify the performance implications of this change

alamb

Thank you for this PR @aryan-212 and (especially) for the help reviwing @Rafferty97

alamb · 2026-03-17T19:12:33Z

arrow-cast/src/parse.rs

 impl Parser for Float16Type {
    fn parse(string: &str) -> Option<f16> {
-        lexical_core::parse(string.as_bytes())
+        lexical_core::parse(string.trim().as_bytes())


I wonder if there will be any performance implications 🤔

alamb · 2026-03-17T19:14:01Z

arrow-cast/src/parse.rs

+    #[test]
+    fn test_parse_trimmed_whitespace() {
+        // Float types
+        assert_eq!(Float16Type::parse(" 1.5 "), Some(f16::from_f32(1.5)));


Cn you please add some tests with only leading whitespace and some tests with only trailing whitespace?

added them!! @alamb

adriangbot · 2026-03-17T19:14:49Z

🤖 Arrow criterion benchmark running (GKE) | trigger
Linux bench-c4077394630-385-c4bkf 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux
Comparing trim-num-str (8fbf1b8) to d3c7900 (merge-base) diff
BENCH_NAME=cast_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench cast_kernels
BENCH_FILTER=
Results will be posted here when complete

adriangbot · 2026-03-17T19:31:44Z

🤖 Arrow criterion benchmark completed (GKE) | trigger

Details

group                                                              main                                   trim-num-str
-----                                                              ----                                   ------------
cast binary view to string                                         1.00     59.9±0.57µs        ? ?/sec    1.08     64.4±0.53µs        ? ?/sec
cast binary view to string view                                    1.00     64.8±0.31µs        ? ?/sec    1.00     64.5±0.43µs        ? ?/sec
cast binary view to wide string                                    1.07     63.4±0.58µs        ? ?/sec    1.00     59.2±0.50µs        ? ?/sec
cast date32 to date64 512                                          1.01    343.2±5.68ns        ? ?/sec    1.00    339.8±1.30ns        ? ?/sec
cast date64 to date32 512                                          1.00    411.5±3.73ns        ? ?/sec    1.00    412.0±1.29ns        ? ?/sec
cast decimal128 to decimal128 512                                  1.00    578.8±1.08ns        ? ?/sec    1.01    583.6±2.96ns        ? ?/sec
cast decimal128 to decimal128 512 lower precision                  1.01      2.5±0.00µs        ? ?/sec    1.00      2.5±0.00µs        ? ?/sec
cast decimal128 to decimal128 512 with lower scale (infallible)    1.00      3.0±0.00µs        ? ?/sec    1.00      3.0±0.00µs        ? ?/sec
cast decimal128 to decimal128 512 with same scale                  1.00     74.7±0.58ns        ? ?/sec    1.01     75.5±0.61ns        ? ?/sec
cast decimal128 to decimal256 512                                  1.00   1824.2±2.62ns        ? ?/sec    1.00   1824.4±1.85ns        ? ?/sec
cast decimal256 to decimal128 512                                  1.00     21.5±0.03µs        ? ?/sec    1.00     21.4±0.02µs        ? ?/sec
cast decimal256 to decimal256 512                                  1.01      6.5±0.02µs        ? ?/sec    1.00      6.5±0.01µs        ? ?/sec
cast decimal256 to decimal256 512 with same scale                  1.00     74.3±0.55ns        ? ?/sec    1.00     74.5±0.42ns        ? ?/sec
cast decimal32 to decimal32 512                                    1.00    993.7±0.70ns        ? ?/sec    1.06   1054.6±1.97ns        ? ?/sec
cast decimal32 to decimal32 512 lower precision                    1.00   1205.0±1.11ns        ? ?/sec    1.05   1260.3±2.02ns        ? ?/sec
cast decimal32 to decimal64 512                                    1.00    363.3±1.43ns        ? ?/sec    1.01    365.2±2.29ns        ? ?/sec
cast decimal64 to decimal32 512                                    1.00      2.4±0.00µs        ? ?/sec    1.01      2.4±0.00µs        ? ?/sec
cast decimal64 to decimal64 512                                    1.00    327.9±0.56ns        ? ?/sec    1.02    333.2±2.23ns        ? ?/sec
cast dict to string view                                           1.04     42.1±1.29µs        ? ?/sec    1.00     40.6±1.74µs        ? ?/sec
cast f32 to string 512                                             1.00     11.9±0.04µs        ? ?/sec    1.00     11.8±0.07µs        ? ?/sec
cast f64 to string 512                                             1.00     15.6±0.04µs        ? ?/sec    1.00     15.6±0.06µs        ? ?/sec
cast float32 to int32 512                                          1.00   1400.0±5.85ns        ? ?/sec    1.01   1407.8±8.22ns        ? ?/sec
cast float64 to float32 512                                        1.00    687.2±2.99ns        ? ?/sec    1.03    709.3±2.89ns        ? ?/sec
cast float64 to uint64 512                                         1.00  1474.1±10.57ns        ? ?/sec    1.00  1474.7±13.82ns        ? ?/sec
cast i64 to string 512                                             1.00      9.3±0.05µs        ? ?/sec    1.00      9.2±0.04µs        ? ?/sec
cast int32 to float32 512                                          1.00    710.7±2.85ns        ? ?/sec    1.00    708.8±6.04ns        ? ?/sec
cast int32 to float64 512                                          1.00    704.3±3.35ns        ? ?/sec    1.03    726.7±4.57ns        ? ?/sec
cast int32 to int32 512                                            1.03    177.4±6.40ns        ? ?/sec    1.00    172.9±1.12ns        ? ?/sec
cast int32 to int64 512                                            1.00    726.6±2.91ns        ? ?/sec    1.03    744.8±8.87ns        ? ?/sec
cast int32 to uint32 512                                           1.00   1396.5±1.18ns        ? ?/sec    1.00   1397.5±3.32ns        ? ?/sec
cast int64 to int32 512                                            1.00   1489.5±1.62ns        ? ?/sec    1.01   1497.6±6.69ns        ? ?/sec
cast no runs of int32s to ree<int32>                               1.01     57.5±1.17µs        ? ?/sec    1.00     57.2±1.74µs        ? ?/sec
cast runs of 10 string to ree<int32>                               1.00      8.7±0.09µs        ? ?/sec    1.00      8.7±0.06µs        ? ?/sec
cast runs of 1000 int32s to ree<int32>                             1.00      3.4±0.01µs        ? ?/sec    1.00      3.4±0.01µs        ? ?/sec
cast string single run to ree<int32>                               1.00     27.3±0.02µs        ? ?/sec    1.00     27.3±0.02µs        ? ?/sec
cast string to binary view 512                                     1.01      2.4±0.01µs        ? ?/sec    1.00      2.4±0.01µs        ? ?/sec
cast string view to binary view                                    1.00     72.9±0.84ns        ? ?/sec    1.00     73.3±2.10ns        ? ?/sec
cast string view to dict                                           1.01    175.7±1.04µs        ? ?/sec    1.00    174.4±0.67µs        ? ?/sec
cast string view to string                                         1.00     39.6±1.53µs        ? ?/sec    1.00     39.4±1.21µs        ? ?/sec
cast string view to wide string                                    1.00     39.7±1.39µs        ? ?/sec    1.00     39.8±1.62µs        ? ?/sec
cast time32s to time32ms 512                                       1.08    160.2±4.02ns        ? ?/sec    1.00    148.6±0.79ns        ? ?/sec
cast time32s to time64us 512                                       1.00    337.9±0.66ns        ? ?/sec    1.00    339.4±1.36ns        ? ?/sec
cast time64ns to time32s 512                                       1.02    420.2±0.69ns        ? ?/sec    1.00    411.9±0.99ns        ? ?/sec
cast timestamp_ms to i64 512                                       1.00    255.1±3.19ns        ? ?/sec    1.00    254.7±1.70ns        ? ?/sec
cast timestamp_ms to timestamp_ns 512                              1.00   1873.5±3.06ns        ? ?/sec    1.00   1876.5±8.00ns        ? ?/sec
cast timestamp_ns to timestamp_s 512                               1.01    171.9±2.06ns        ? ?/sec    1.00    170.9±1.24ns        ? ?/sec
cast utf8 to date32 512                                            1.02      6.6±0.03µs        ? ?/sec    1.00      6.5±0.03µs        ? ?/sec
cast utf8 to date64 512                                            1.00     32.1±0.09µs        ? ?/sec    1.00     32.0±0.11µs        ? ?/sec
cast utf8 to f32                                                   1.00      5.7±0.04µs        ? ?/sec    1.16      6.6±0.03µs        ? ?/sec
cast wide string to binary view 512                                1.02      4.2±0.28µs        ? ?/sec    1.00      4.1±0.08µs        ? ?/sec

Resource Usage

base (merge-base)

Metric	Value
Wall time	491.9s
Peak memory	2.1 GiB
Avg memory	2.1 GiB
CPU user	490.6s
CPU sys	1.1s
Disk read	0 B
Disk write	1.2 GiB

branch

Metric	Value
Wall time	488.7s
Peak memory	2.1 GiB
Avg memory	2.1 GiB
CPU user	488.5s
CPU sys	0.2s
Disk read	0 B
Disk write	3.8 MiB

aryan-212 · 2026-03-18T08:50:32Z

🤖 Arrow criterion benchmark completed (GKE) | trigger

Details

group                                                              main                                   trim-num-str
-----                                                              ----                                   ------------
cast binary view to string                                         1.00     59.9±0.57µs        ? ?/sec    1.08     64.4±0.53µs        ? ?/sec
cast binary view to string view                                    1.00     64.8±0.31µs        ? ?/sec    1.00     64.5±0.43µs        ? ?/sec
cast binary view to wide string                                    1.07     63.4±0.58µs        ? ?/sec    1.00     59.2±0.50µs        ? ?/sec
cast date32 to date64 512                                          1.01    343.2±5.68ns        ? ?/sec    1.00    339.8±1.30ns        ? ?/sec
cast date64 to date32 512                                          1.00    411.5±3.73ns        ? ?/sec    1.00    412.0±1.29ns        ? ?/sec
cast decimal128 to decimal128 512                                  1.00    578.8±1.08ns        ? ?/sec    1.01    583.6±2.96ns        ? ?/sec
cast decimal128 to decimal128 512 lower precision                  1.01      2.5±0.00µs        ? ?/sec    1.00      2.5±0.00µs        ? ?/sec
cast decimal128 to decimal128 512 with lower scale (infallible)    1.00      3.0±0.00µs        ? ?/sec    1.00      3.0±0.00µs        ? ?/sec
cast decimal128 to decimal128 512 with same scale                  1.00     74.7±0.58ns        ? ?/sec    1.01     75.5±0.61ns        ? ?/sec
cast decimal128 to decimal256 512                                  1.00   1824.2±2.62ns        ? ?/sec    1.00   1824.4±1.85ns        ? ?/sec
cast decimal256 to decimal128 512                                  1.00     21.5±0.03µs        ? ?/sec    1.00     21.4±0.02µs        ? ?/sec
cast decimal256 to decimal256 512                                  1.01      6.5±0.02µs        ? ?/sec    1.00      6.5±0.01µs        ? ?/sec
cast decimal256 to decimal256 512 with same scale                  1.00     74.3±0.55ns        ? ?/sec    1.00     74.5±0.42ns        ? ?/sec
cast decimal32 to decimal32 512                                    1.00    993.7±0.70ns        ? ?/sec    1.06   1054.6±1.97ns        ? ?/sec
cast decimal32 to decimal32 512 lower precision                    1.00   1205.0±1.11ns        ? ?/sec    1.05   1260.3±2.02ns        ? ?/sec
cast decimal32 to decimal64 512                                    1.00    363.3±1.43ns        ? ?/sec    1.01    365.2±2.29ns        ? ?/sec
cast decimal64 to decimal32 512                                    1.00      2.4±0.00µs        ? ?/sec    1.01      2.4±0.00µs        ? ?/sec
cast decimal64 to decimal64 512                                    1.00    327.9±0.56ns        ? ?/sec    1.02    333.2±2.23ns        ? ?/sec
cast dict to string view                                           1.04     42.1±1.29µs        ? ?/sec    1.00     40.6±1.74µs        ? ?/sec
cast f32 to string 512                                             1.00     11.9±0.04µs        ? ?/sec    1.00     11.8±0.07µs        ? ?/sec
cast f64 to string 512                                             1.00     15.6±0.04µs        ? ?/sec    1.00     15.6±0.06µs        ? ?/sec
cast float32 to int32 512                                          1.00   1400.0±5.85ns        ? ?/sec    1.01   1407.8±8.22ns        ? ?/sec
cast float64 to float32 512                                        1.00    687.2±2.99ns        ? ?/sec    1.03    709.3±2.89ns        ? ?/sec
cast float64 to uint64 512                                         1.00  1474.1±10.57ns        ? ?/sec    1.00  1474.7±13.82ns        ? ?/sec
cast i64 to string 512                                             1.00      9.3±0.05µs        ? ?/sec    1.00      9.2±0.04µs        ? ?/sec
cast int32 to float32 512                                          1.00    710.7±2.85ns        ? ?/sec    1.00    708.8±6.04ns        ? ?/sec
cast int32 to float64 512                                          1.00    704.3±3.35ns        ? ?/sec    1.03    726.7±4.57ns        ? ?/sec
cast int32 to int32 512                                            1.03    177.4±6.40ns        ? ?/sec    1.00    172.9±1.12ns        ? ?/sec
cast int32 to int64 512                                            1.00    726.6±2.91ns        ? ?/sec    1.03    744.8±8.87ns        ? ?/sec
cast int32 to uint32 512                                           1.00   1396.5±1.18ns        ? ?/sec    1.00   1397.5±3.32ns        ? ?/sec
cast int64 to int32 512                                            1.00   1489.5±1.62ns        ? ?/sec    1.01   1497.6±6.69ns        ? ?/sec
cast no runs of int32s to ree<int32>                               1.01     57.5±1.17µs        ? ?/sec    1.00     57.2±1.74µs        ? ?/sec
cast runs of 10 string to ree<int32>                               1.00      8.7±0.09µs        ? ?/sec    1.00      8.7±0.06µs        ? ?/sec
cast runs of 1000 int32s to ree<int32>                             1.00      3.4±0.01µs        ? ?/sec    1.00      3.4±0.01µs        ? ?/sec
cast string single run to ree<int32>                               1.00     27.3±0.02µs        ? ?/sec    1.00     27.3±0.02µs        ? ?/sec
cast string to binary view 512                                     1.01      2.4±0.01µs        ? ?/sec    1.00      2.4±0.01µs        ? ?/sec
cast string view to binary view                                    1.00     72.9±0.84ns        ? ?/sec    1.00     73.3±2.10ns        ? ?/sec
cast string view to dict                                           1.01    175.7±1.04µs        ? ?/sec    1.00    174.4±0.67µs        ? ?/sec
cast string view to string                                         1.00     39.6±1.53µs        ? ?/sec    1.00     39.4±1.21µs        ? ?/sec
cast string view to wide string                                    1.00     39.7±1.39µs        ? ?/sec    1.00     39.8±1.62µs        ? ?/sec
cast time32s to time32ms 512                                       1.08    160.2±4.02ns        ? ?/sec    1.00    148.6±0.79ns        ? ?/sec
cast time32s to time64us 512                                       1.00    337.9±0.66ns        ? ?/sec    1.00    339.4±1.36ns        ? ?/sec
cast time64ns to time32s 512                                       1.02    420.2±0.69ns        ? ?/sec    1.00    411.9±0.99ns        ? ?/sec
cast timestamp_ms to i64 512                                       1.00    255.1±3.19ns        ? ?/sec    1.00    254.7±1.70ns        ? ?/sec
cast timestamp_ms to timestamp_ns 512                              1.00   1873.5±3.06ns        ? ?/sec    1.00   1876.5±8.00ns        ? ?/sec
cast timestamp_ns to timestamp_s 512                               1.01    171.9±2.06ns        ? ?/sec    1.00    170.9±1.24ns        ? ?/sec
cast utf8 to date32 512                                            1.02      6.6±0.03µs        ? ?/sec    1.00      6.5±0.03µs        ? ?/sec
cast utf8 to date64 512                                            1.00     32.1±0.09µs        ? ?/sec    1.00     32.0±0.11µs        ? ?/sec
cast utf8 to f32                                                   1.00      5.7±0.04µs        ? ?/sec    1.16      6.6±0.03µs        ? ?/sec
cast wide string to binary view 512                                1.02      4.2±0.28µs        ? ?/sec    1.00      4.1±0.08µs        ? ?/sec

Resource Usage
base (merge-base)

Metric Value
Wall time 491.9s
Peak memory 2.1 GiB
Avg memory 2.1 GiB
CPU user 490.6s
CPU sys 1.1s
Disk read 0 B
Disk write 1.2 GiB
branch

Metric Value
Wall time 488.7s
Peak memory 2.1 GiB
Avg memory 2.1 GiB
CPU user 488.5s
CPU sys 0.2s
Disk read 0 B
Disk write 3.8 MiB

I don't think there is any significant perf regression, isn't it?

alamb · 2026-03-18T19:42:21Z

I don't think there is any significant perf regression, isn't it?

My reading of this result is that this code is 16% slower when parsing strings to floating point.

group                                                              main                                   trim-num-str
-----                                                              ----                                   ------------
...
cast utf8 to f32                                                   1.00      5.7±0.04µs        ? ?/sec    1.16      6.6±0.03µs        ? ?/sec

I will rerun the benchmarks to see if this is reproducable

alamb · 2026-03-18T19:42:33Z

run benchmark cast_kernels

adriangbot · 2026-03-18T19:45:25Z

🤖 Arrow criterion benchmark running (GKE) | trigger
Linux bench-c4085146674-431-p5kw6 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux
Comparing trim-num-str (e593cb0) to d3c7900 (merge-base) diff
BENCH_NAME=cast_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench cast_kernels
BENCH_FILTER=
Results will be posted here when complete

adriangbot · 2026-03-18T20:02:13Z

🤖 Arrow criterion benchmark completed (GKE) | trigger

Details

group                                                              main                                   trim-num-str
-----                                                              ----                                   ------------
cast binary view to string                                         1.02     60.6±2.15µs        ? ?/sec    1.00     59.6±0.61µs        ? ?/sec
cast binary view to string view                                    1.00     64.2±0.61µs        ? ?/sec    1.01     64.6±0.40µs        ? ?/sec
cast binary view to wide string                                    1.00     59.2±0.57µs        ? ?/sec    1.05     62.2±0.44µs        ? ?/sec
cast date32 to date64 512                                          1.00    338.4±0.30ns        ? ?/sec    1.00    338.6±0.35ns        ? ?/sec
cast date64 to date32 512                                          1.01    417.6±0.28ns        ? ?/sec    1.00    411.9±0.43ns        ? ?/sec
cast decimal128 to decimal128 512                                  1.00    579.1±0.89ns        ? ?/sec    1.02    589.9±1.51ns        ? ?/sec
cast decimal128 to decimal128 512 lower precision                  1.00      2.5±0.00µs        ? ?/sec    1.01      2.5±0.00µs        ? ?/sec
cast decimal128 to decimal128 512 with lower scale (infallible)    1.00      3.0±0.00µs        ? ?/sec    1.00      3.0±0.00µs        ? ?/sec
cast decimal128 to decimal128 512 with same scale                  1.00     73.6±0.57ns        ? ?/sec    1.03     75.5±0.50ns        ? ?/sec
cast decimal128 to decimal256 512                                  1.00   1824.3±0.99ns        ? ?/sec    1.00   1832.9±1.37ns        ? ?/sec
cast decimal256 to decimal128 512                                  1.00     21.5±0.02µs        ? ?/sec    1.00     21.5±0.02µs        ? ?/sec
cast decimal256 to decimal256 512                                  1.00      6.5±0.01µs        ? ?/sec    1.00      6.5±0.01µs        ? ?/sec
cast decimal256 to decimal256 512 with same scale                  1.00     73.3±0.43ns        ? ?/sec    1.03     75.3±0.44ns        ? ?/sec
cast decimal32 to decimal32 512                                    1.00   1003.7±1.11ns        ? ?/sec    1.03   1036.5±1.14ns        ? ?/sec
cast decimal32 to decimal32 512 lower precision                    1.00   1208.7±1.84ns        ? ?/sec    1.04   1262.6±1.43ns        ? ?/sec
cast decimal32 to decimal64 512                                    1.00    364.1±1.29ns        ? ?/sec    1.03    375.0±2.07ns        ? ?/sec
cast decimal64 to decimal32 512                                    1.00      2.4±0.00µs        ? ?/sec    1.01      2.4±0.00µs        ? ?/sec
cast decimal64 to decimal64 512                                    1.00    329.3±0.58ns        ? ?/sec    1.04    343.1±1.72ns        ? ?/sec
cast dict to string view                                           1.00     41.1±1.51µs        ? ?/sec    1.01     41.5±1.69µs        ? ?/sec
cast f32 to string 512                                             1.05     12.5±0.57µs        ? ?/sec    1.00     11.8±0.06µs        ? ?/sec
cast f64 to string 512                                             1.00     15.6±0.03µs        ? ?/sec    1.00     15.6±0.05µs        ? ?/sec
cast float32 to int32 512                                          1.00   1401.2±9.53ns        ? ?/sec    1.02   1426.8±6.95ns        ? ?/sec
cast float64 to float32 512                                        1.00    688.3±2.86ns        ? ?/sec    1.01    695.3±2.56ns        ? ?/sec
cast float64 to uint64 512                                         1.00  1459.5±19.25ns        ? ?/sec    1.00  1458.5±17.42ns        ? ?/sec
cast i64 to string 512                                             1.00      9.3±0.04µs        ? ?/sec    1.00      9.3±0.04µs        ? ?/sec
cast int32 to float32 512                                          1.02    728.1±4.26ns        ? ?/sec    1.00    712.6±4.35ns        ? ?/sec
cast int32 to float64 512                                          1.00    726.5±2.83ns        ? ?/sec    1.01    731.1±4.27ns        ? ?/sec
cast int32 to int32 512                                            1.03    178.1±2.10ns        ? ?/sec    1.00    172.4±1.11ns        ? ?/sec
cast int32 to int64 512                                            1.00    703.7±3.51ns        ? ?/sec    1.05    742.1±2.80ns        ? ?/sec
cast int32 to uint32 512                                           1.00   1396.9±1.69ns        ? ?/sec    1.00   1397.7±1.26ns        ? ?/sec
cast int64 to int32 512                                            1.01   1501.8±1.55ns        ? ?/sec    1.00   1491.0±1.24ns        ? ?/sec
cast no runs of int32s to ree<int32>                               1.00     56.9±1.11µs        ? ?/sec    1.11     63.3±1.09µs        ? ?/sec
cast runs of 10 string to ree<int32>                               1.01      8.8±0.08µs        ? ?/sec    1.00      8.7±0.06µs        ? ?/sec
cast runs of 1000 int32s to ree<int32>                             1.01      3.4±0.01µs        ? ?/sec    1.00      3.4±0.01µs        ? ?/sec
cast string single run to ree<int32>                               1.00     27.4±0.02µs        ? ?/sec    1.00     27.3±0.02µs        ? ?/sec
cast string to binary view 512                                     1.00      2.4±0.01µs        ? ?/sec    1.00      2.4±0.01µs        ? ?/sec
cast string view to binary view                                    1.11     80.6±5.86ns        ? ?/sec    1.00     72.5±0.82ns        ? ?/sec
cast string view to dict                                           1.31   228.0±48.98µs        ? ?/sec    1.00    174.7±0.54µs        ? ?/sec
cast string view to string                                         1.00     39.6±1.54µs        ? ?/sec    1.01     39.9±1.48µs        ? ?/sec
cast string view to wide string                                    1.00     39.6±1.39µs        ? ?/sec    1.01     39.9±1.27µs        ? ?/sec
cast time32s to time32ms 512                                       1.00    155.6±0.27ns        ? ?/sec    1.01    157.0±0.41ns        ? ?/sec
cast time32s to time64us 512                                       1.00    336.6±0.40ns        ? ?/sec    1.00    338.1±0.53ns        ? ?/sec
cast time64ns to time32s 512                                       1.00    419.1±0.26ns        ? ?/sec    1.00    420.5±0.39ns        ? ?/sec
cast timestamp_ms to i64 512                                       1.00    253.3±0.96ns        ? ?/sec    1.00    253.6±0.92ns        ? ?/sec
cast timestamp_ms to timestamp_ns 512                              1.01   1931.3±9.63ns        ? ?/sec    1.00   1914.4±5.66ns        ? ?/sec
cast timestamp_ns to timestamp_s 512                               1.03    175.9±1.65ns        ? ?/sec    1.00    170.6±0.87ns        ? ?/sec
cast utf8 to date32 512                                            1.01      6.7±0.03µs        ? ?/sec    1.00      6.6±0.04µs        ? ?/sec
cast utf8 to date64 512                                            1.00     33.9±0.97µs        ? ?/sec    1.02     34.6±0.13µs        ? ?/sec
cast utf8 to f32                                                   1.00      5.7±0.03µs        ? ?/sec    1.17      6.6±0.03µs        ? ?/sec
cast wide string to binary view 512                                1.01      4.1±0.07µs        ? ?/sec    1.00      4.0±0.07µs        ? ?/sec

Resource Usage

base (merge-base)

Metric	Value
Wall time	489.4s
Peak memory	2.1 GiB
Avg memory	2.1 GiB
CPU user	488.5s
CPU sys	0.7s
Disk read	0 B
Disk write	1.2 GiB

branch

Metric	Value
Wall time	487.4s
Peak memory	2.1 GiB
Avg memory	2.1 GiB
CPU user	487.2s
CPU sys	0.2s
Disk read	0 B
Disk write	3.1 MiB

alamb · 2026-03-18T21:15:57Z

🤔 same slowdown:

cast utf8 to f32 1.00 5.7±0.03µs ? ?/sec 1.17 6.6±0.03µs ? ?/sec

aryan-212 · 2026-03-19T04:18:04Z

hmmm, should I make changes in some other layer where the performance hit wouldn't be this much? Any pointers if possible?

@alamb

alamb · 2026-03-19T13:53:29Z

hmmm, should I make changes in some other layer where the performance hit wouldn't be this much? Any pointers if possible?

@alamb

Well we can use the existing trim function in DataFusion

DataFusion CLI v52.3.0
> select '  4.5'::float;
Optimizer rule 'simplify_expressions' failed
caused by
Arrow error: Cast error: Cannot cast string '  4.5' to value of Float32 type
> select trim('  4.5')::float;
+----------------------+
| btrim(Utf8("  4.5")) |
+----------------------+
| 4.5                  |
+----------------------+
1 row(s) fetched.
Elapsed 0.022 seconds.

Interestingly DuckDB does automatically trim

andrewlamb@Andrews-MacBook-Pro-3:~/Software/influxdb_pro/ent$ duckdb
DuckDB v1.5.0 (Variegata)
Enter ".help" for usage hints.
memory D select '  4.5'::float;
┌────────────────────────┐
│ CAST('  4.5' AS FLOAT) │
│         float          │
├────────────────────────┤
│                    4.5 │
└────────────────────────┘

As does postgres 🤔

andrewlamb@Andrews-MacBook-Pro-3:~/Software/influxdb_pro/ent$ psql -h localhost -U postgres
psql (14.22 (Homebrew), server 11.16 (Debian 11.16-1.pgdg90+1))
Type "help" for help.

postgres=#  select '  4.5'::float;
 float8
--------
    4.5
(1 row)

alamb · 2026-03-19T13:55:09Z

Is it possible to improve the performance of trim? For example, trim just based on ascii whitespace rather than having to do utf8 checking?

fix: trim before parsing numbers

8fbf1b8

Rafferty97 approved these changes Mar 13, 2026

View reviewed changes

alamb reviewed Mar 17, 2026

View reviewed changes

add trailing and leading spaces tests

e593cb0

github-actions bot added the arrow Changes to the arrow crate label Mar 18, 2026

alamb mentioned this pull request Mar 19, 2026

arrow-cast numeric parsers fail to parse whitespace-padded strings #9538

Open

Conversation

aryan-212 commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

tustvold commented Mar 13, 2026

Uh oh!

aryan-212 commented Mar 13, 2026

Uh oh!

Rafferty97 commented Mar 13, 2026

Uh oh!

alamb commented Mar 17, 2026

Uh oh!

alamb commented Mar 17, 2026

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

alamb Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

alamb Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

aryan-212 Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

adriangbot commented Mar 17, 2026

Uh oh!

adriangbot commented Mar 17, 2026

Uh oh!

aryan-212 commented Mar 18, 2026

Uh oh!

alamb commented Mar 18, 2026

Uh oh!

alamb commented Mar 18, 2026

Uh oh!

adriangbot commented Mar 18, 2026

Uh oh!

adriangbot commented Mar 18, 2026

Uh oh!

alamb commented Mar 18, 2026

Uh oh!

aryan-212 commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alamb commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alamb commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

aryan-212 commented Mar 11, 2026 •

edited

Loading

aryan-212 commented Mar 19, 2026 •

edited

Loading

alamb commented Mar 19, 2026 •

edited

Loading