enable "thin LTO" in release profile#23179
enable "thin LTO" in release profile#23179cburroughs wants to merge 1 commit intopantsbuild:mainfrom
Conversation
So the naming here is kind of a mess: * I don't know how Rust of all places ends up with this combo bool/string type. * "fat" isn't "more LTO" than "thin"; "thin" and "fat" are more like different algorithms. But the summary is that "thin" is the new/better algorithm. This takes `native_engine.so` from 189MiB to 172MiB and in various tests I've run with `hyperfine` I see results like `1.05 ± 0.06`, or `1.02 ± 0.09`. So not going to write a blog post about it, but we don't seem to hit any pathological corner cases and I'll take a few percentage points for free. References: * https://nnethercote.github.io/perf-book/build-configuration.html#link-time-optimization * https://doc.rust-lang.org/cargo/reference/profiles.html#lto
There was a problem hiding this comment.
This is a NACK from me.
Thin LTO adds something like 20% to compile times on my machine, for no real perceivable benefit (that's with a populated compilation cache).
I have some prior art here, but I think the way to go is a separate release-lto profile that is highly optimized (CI only), and we keep this release profile for local dev when needed.
https://pantsbuild.slack.com/archives/C0D7TNJHL/p1759710471401089
https://pantsbuild.slack.com/archives/C0D7TNJHL/p1761222477985659
https://pantsbuild.slack.com/archives/C0D7TNJHL/p1772296966678039
https://pantsbuild.slack.com/archives/C0D7TNJHL/p1771776965875209
There are a handful of optimizations we can make, but I think we need to bikeshed the tradeoffs and determine what we really want (I'm always about more performance, less concerned about filesize for tooling like this).
RipGrep: Not 1:1 compatible, but https://github.com/BurntSushi/ripgrep/blob/4519153e5e461527f4bca45b042fff45c4ec6fb9/Cargo.toml#L77
Addendum:
I would likely use a release profile with codegen-units set to default, and debug = line-tables-only. That's a pretty big filesize savings on Linux, and perf improvement across the board for local dev.
release-lto would be as optimized as we could reasonably make it, without turning off too many guardrails, or making it too unmanageable to debug in the field.
|
I'm not deeply familiar with Rust idioms and it is mildly surprising to me that when you develop Pants from source you use the |
"I" don't - "pants" does. It's built into the cargo/pants scripts, and I think a lot of it may be a remnant. Setting the MODE to debug is a way around this, but the default "clone and build" is a release, for better or worse. That's also used in CI I think. It's still pretty slow running in debug, but better ever since call-by-name. I try to maintain a patch for faster release compilation that I have to remember to apply to every branch/repo, which is a mild nightmare. Either way, if we're changing how this works, we may as well try to do it all. The suggestion I had being, setup CI to use |
So the naming here is kind of a mess:
But the summary is that "thin" is the new/better algorithm.
This takes
native_engine.sofrom 189MiB to 172MiB and in various tests I've run withhyperfineI see results like1.05 ± 0.06, or1.02 ± 0.09. So not going to write a blog post about it, but we don't seem to hit any pathological corner cases and I'll take a few percentage points for free.References: