Skip to content

enable "thin LTO" in release profile#23179

Open
cburroughs wants to merge 1 commit intopantsbuild:mainfrom
cburroughs:csb/enable-lto
Open

enable "thin LTO" in release profile#23179
cburroughs wants to merge 1 commit intopantsbuild:mainfrom
cburroughs:csb/enable-lto

Conversation

@cburroughs
Copy link
Copy Markdown
Contributor

So the naming here is kind of a mess:

  • I don't know how Rust of all places ends up with this combo bool/string type.
  • "fat" isn't "more LTO" than "thin"; "thin" and "fat" are more like different algorithms.

But the summary is that "thin" is the new/better algorithm.

This takes native_engine.so from 189MiB to 172MiB and in various tests I've run with hyperfine I see results like 1.05 ± 0.06, or 1.02 ± 0.09. So not going to write a blog post about it, but we don't seem to hit any pathological corner cases and I'll take a few percentage points for free.

References:

So the naming here is kind of a mess:
 * I don't know how Rust of all places ends up with this combo
 bool/string type.
 * "fat" isn't "more LTO" than "thin"; "thin" and "fat" are more like
 different algorithms.

But the summary is that "thin" is the new/better algorithm.

This takes `native_engine.so` from 189MiB to 172MiB and in various
tests I've run with `hyperfine` I see results like `1.05 ± 0.06`, or
`1.02 ± 0.09`.  So not going to write a blog post about it, but we
don't seem to hit any pathological corner cases and I'll take a few
percentage points for free.

References:
 * https://nnethercote.github.io/perf-book/build-configuration.html#link-time-optimization
 * https://doc.rust-lang.org/cargo/reference/profiles.html#lto
@cburroughs cburroughs self-assigned this Mar 17, 2026
@cburroughs cburroughs marked this pull request as ready for review March 17, 2026 17:43
Copy link
Copy Markdown
Member

@sureshjoshi sureshjoshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a NACK from me.

Thin LTO adds something like 20% to compile times on my machine, for no real perceivable benefit (that's with a populated compilation cache).

I have some prior art here, but I think the way to go is a separate release-lto profile that is highly optimized (CI only), and we keep this release profile for local dev when needed.

https://pantsbuild.slack.com/archives/C0D7TNJHL/p1759710471401089
https://pantsbuild.slack.com/archives/C0D7TNJHL/p1761222477985659
https://pantsbuild.slack.com/archives/C0D7TNJHL/p1772296966678039
https://pantsbuild.slack.com/archives/C0D7TNJHL/p1771776965875209

There are a handful of optimizations we can make, but I think we need to bikeshed the tradeoffs and determine what we really want (I'm always about more performance, less concerned about filesize for tooling like this).

RipGrep: Not 1:1 compatible, but https://github.com/BurntSushi/ripgrep/blob/4519153e5e461527f4bca45b042fff45c4ec6fb9/Cargo.toml#L77

Addendum:
I would likely use a release profile with codegen-units set to default, and debug = line-tables-only. That's a pretty big filesize savings on Linux, and perf improvement across the board for local dev.

release-lto would be as optimized as we could reasonably make it, without turning off too many guardrails, or making it too unmanageable to debug in the field.

@cburroughs
Copy link
Copy Markdown
Contributor Author

I'm not deeply familiar with Rust idioms and it is mildly surprising to me that when you develop Pants from source you use the release profile and not dev. But maybe that is typical? Anyway, I don't care of we call the thing used for actual releases release or really-release or release-lto.

@sureshjoshi
Copy link
Copy Markdown
Member

it is mildly surprising to me that when you develop Pants from source you use the release profile and not dev

"I" don't - "pants" does. It's built into the cargo/pants scripts, and I think a lot of it may be a remnant. Setting the MODE to debug is a way around this, but the default "clone and build" is a release, for better or worse. That's also used in CI I think.

It's still pretty slow running in debug, but better ever since call-by-name. I try to maintain a patch for faster release compilation that I have to remember to apply to every branch/repo, which is a mild nightmare.

Either way, if we're changing how this works, we may as well try to do it all.

The suggestion I had being, setup CI to use release-lto and all enabled optimizations, while regular day-to-day release mode uses slightly fewer optimizations than we currently have (ie. defaults).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants