feat: add pcre2 as optional feature#1959
Conversation
michaelfeil
left a comment
There was a problem hiding this comment.
This looks good to me, would recommend adding.
|
Ran similar benchmarks: #1968 noticed that the reason why speedup is so low is because we did not add benchmarks that capture long context + parallel encoding. Perf should be a little bit better on longer context. |
|
I also think this should be able to cross-compile via py03, but not fully sure. Similar to other regex backend trade-offs. |
ArthurZucker
left a comment
There was a problem hiding this comment.
LGTM, but can we add some doc / something for the users to know they can potentially get better perfs with this? 🤗
I'll investigate on the tradeoffs as it might be worth defaulting to it!
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
@ArthurZucker I have added it to the I have also added it into the python bindings for maturin develop, but let me know if that's out of the scope here, happy to revert or make the necessary changes. |
Motivation: Exploring performance profiling and noticed onig showing up in the profiles and tried swapping for pcre2. Happy to get some feedback - I'm not deeply familiar with the tradeoffs.
I have validated that all tests pass and the benchmarks shows that its better for GPT2 and Llama3 models:
Commands used:
Based on perf these were my CPU samples: