perf(pm): add hyperfine --warmup 1 to phase-bench#2908
Merged
Conversation
The bench iterates PMs in fixed order (`utoo, utoo-next, utoo-npm, bun` per `PM_LIST`) within each phase. The first PM pays the cold-network tax — DNS resolver miss, TLS session ticket cold, npm CDN edge POP unpopulated — and `--runs 3` averages that cold iteration into the wall mean. PMs that run later inherit the warm state for free. A standalone smoke test confirmed the magnitude: the same utoo binary, run twice back-to-back through `pm-bench-pcap.sh`, showed ~3s wall and 270 MB pcap-size delta on `p3_install` between the first and second run, attributed to CDN/DNS/TLS warm-up. Folded through `--runs 3` averaging that's roughly +1s on the first PM's mean — the same magnitude as the per-PR p0 deltas observed on #2903 / #2904 / #2905, which makes those mean-deltas indistinguishable from ordering bias. `--warmup 1` makes hyperfine run one untimed iteration through `--prepare` before the timed runs start, so each PM enters its measurement window with the network path warmed independently. The cross-PM ordering bias in the LATER measurements collapses because every PM starts from a primed-network state (the first PM's warmup primes the runner; subsequent PMs' warmups cost nothing extra but hold the same shape). This costs one extra iteration per PM × phase (≈ 33% wall increase on the bench job, well within tolerance), and is independent of the script's PM-ordering question — that one we'll address separately with round-robin if `--warmup` alone doesn't close the gap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Code Review
This pull request introduces a warmup phase to the hyperfine benchmarking command in bench/pm-bench-phases.sh to mitigate network latency bias during performance testing. Feedback suggests shortening the extensive inline comment for better readability, as the detailed rationale is already captured in the commit history.
📊 pm-bench-phases ·
|
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 9.40s | 0.41s | 10.06s | 10.02s | 740M | 339.3K |
| utoo-next | 8.15s | 0.49s | 10.76s | 12.23s | 1.32G | 185.9K |
| utoo-npm | 8.27s | 0.42s | 10.76s | 12.29s | 1.41G | 192.7K |
| utoo | 8.69s | 0.75s | 10.78s | 12.56s | 1.47G | 175.9K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 14.2K | 16.3K | 1.17G | 6M | 1.84G | 1.72G | 1M |
| utoo-next | 117.9K | 80.7K | 1.14G | 4M | 1.68G | 1.68G | 2M |
| utoo-npm | 124.5K | 85.7K | 1.14G | 4M | 1.68G | 1.68G | 2M |
| utoo | 133.4K | 87.0K | 1.14G | 5M | 1.68G | 1.68G | 2M |
p1_resolve
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 3.31s | 2.53s | 4.00s | 1.02s | 505M | 171.7K |
| utoo-next | 3.17s | 0.13s | 5.38s | 1.85s | 600M | 78.8K |
| utoo-npm | 3.01s | 0.05s | 5.29s | 1.86s | 597M | 83.5K |
| utoo | 3.34s | 0.53s | 5.31s | 1.90s | 599M | 89.5K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 7.6K | 4.5K | 202M | 3M | 105M | - | 1M |
| utoo-next | 67.1K | 113.2K | 198M | 2M | 7M | 3M | 2M |
| utoo-npm | 65.7K | 104.9K | 198M | 2M | 7M | 3M | 2M |
| utoo | 65.6K | 105.1K | 198M | 2M | 7M | 3M | 2M |
p3_cold_install
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 6.65s | 0.28s | 6.08s | 9.73s | 632M | 212.7K |
| utoo-next | 7.37s | 1.80s | 5.28s | 11.27s | 963M | 121.9K |
| utoo-npm | 7.84s | 2.93s | 5.25s | 11.25s | 801M | 114.9K |
| utoo | 5.97s | 0.14s | 5.18s | 10.88s | 898M | 124.3K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 4.2K | 6.0K | 994M | 3M | 1.74G | 1.74G | 1M |
| utoo-next | 108.5K | 70.4K | 964M | 3M | 1.67G | 1.67G | 2M |
| utoo-npm | 110.3K | 72.0K | 964M | 3M | 1.67G | 1.67G | 2M |
| utoo | 88.2K | 62.3K | 964M | 2M | 1.67G | 1.67G | 2M |
p4_warm_link
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 3.28s | 0.09s | 0.20s | 2.31s | 135M | 31.7K |
| utoo-next | 2.19s | 0.21s | 0.51s | 3.73s | 80M | 18.5K |
| utoo-npm | 2.06s | 0.15s | 0.51s | 3.80s | 84M | 19.4K |
| utoo | 2.21s | 0.04s | 0.49s | 3.80s | 81M | 18.6K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 303 | 23 | 5M | 14K | 1.88G | 1.72G | 1M |
| utoo-next | 40.3K | 18.0K | 320K | 7K | 1.68G | 1.68G | 2M |
| utoo-npm | 45.9K | 20.9K | 335K | 27K | 1.68G | 1.68G | 2M |
| utoo | 42.5K | 19.7K | 320K | 7K | 1.68G | 1.68G | 2M |
npmmirror.com: no output captured.
fireairforce
approved these changes
May 7, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
The phases bench iterates PMs in fixed order (`utoo, utoo-next, utoo-npm, bun` per `PM_LIST`) within each phase. The first PM pays the cold-network tax — DNS resolver miss, TLS session ticket cold, npm CDN edge POP unpopulated — and `--runs 3` averages that cold iteration into the wall mean. PMs that run later inherit the warm state for free.
A standalone smoke test (`pm-bench-pcap.sh` from #2906, identical-binary run on `chore/pcap-install-phase`) confirmed the magnitude: the same utoo binary, run twice back-to-back, showed +3s wall and +267 MB pcap delta on `p3_install` between the first and second run — pure CDN/DNS/TLS warm-up effect, no code difference.
Folded through `--runs 3` averaging that's roughly +1s on the first PM's p0 mean, the same magnitude as the per-PR deltas on #2903 / #2904 / #2905 (TTY-gate +0.98, streaming +0.49, zero-copy +1.18). Those means are currently indistinguishable from ordering bias.
Change
```diff
if ! hyperfine \
--runs "$RUNS" \
--prepare "bash $prep_script" \
```
`hyperfine --warmup 1` runs one untimed iteration through `--prepare` before the timed runs start, so each PM enters its measurement window with DNS / TLS / CDN already warm. Cross-PM ordering bias in the timed window collapses.
Cost: one extra iteration per PM × phase (≈ 33% bench job wall), well within tolerance.
Smoke test
PR is benchmark-labeled. Both `utoo` (this branch) and `utoo-next` (origin/next) compile from the same Rust code on this PR — only the bench script differs — so any `utoo-vs-utoo-next` p0 mean delta in the resulting comment is the residual ordering bias post-warmup. Target is ~0s.
Companion
🤖 Generated with Claude Code