Skip to content

chore(bench): extend pcap to install phase + utoo-next baseline#2906

Closed
elrrrrrrr wants to merge 4 commits into
nextfrom
chore/pcap-install-phase
Closed

chore(bench): extend pcap to install phase + utoo-next baseline#2906
elrrrrrrr wants to merge 4 commits into
nextfrom
chore/pcap-install-phase

Conversation

@elrrrrrrr
Copy link
Copy Markdown
Contributor

What

The pcap-only bench was previously a one-off that captured p1_resolve across utoo and bun, and assumed the project was already cloned by pm-bench-phases.sh. Install-phase regressions (#2902 / #2903 / #2904 / #2905 σ widening on p0_full_cold) live in the tarball download path — not in resolve.

This PR makes the pcap bench self-contained and covers both phases for three PMs.

Script changes (bench/pm-bench-pcap.sh)

  • Self-clone the project if \$PROJECT_DIR is missing (mirrors pm-bench-phases.sh)
  • Add a <pm>-install capture per PM: lock pre-existing, cache + node_modules wiped, then <pm> install. This is the cold-tarball-download phase where the σ-widening lives.
  • Add utoo-next as a third PM, gated on \$UTOO_NEXT_BIN so local runs still work without it.

Workflow changes (.github/workflows/pm-e2e-bench.yml)

  • pm-bench-pcap-linux now downloads the utoo-next-linux-x64 artifact and exports UTOO_NEXT_BIN, exactly like bench-phases-linux does.
  • The Build next branch utoo and Upload utoo-next binary steps in build-linux now also fire for inputs.target == 'pm-bench-pcap', not only pm-bench-phases.

Outputs

```
/tmp/pm-bench-pcap/
dns.txt
utoo-{resolve,install}.{pcap,log}
utoo-next-{resolve,install}.{pcap,log} (when UTOO_NEXT_BIN set)
bun-{resolve,install}.{pcap,log}
```

Why

Drives the analysis of whether the install hot-path's increased concurrency (FuturesUnordered streaming, zero-copy tar, TTY-gate) saturates outbound TCP and starves the download path — the suspect mechanism for σ widening on p0_full_cold across the #2903 / #2904 / #2905 split experiments.

Smoke test

Dispatched on this chore branch first: utoo binary and utoo-next binary built off the same code (just bench scripts differ), so any utoo-vs-utoo-next install-phase delta on this run is pure environment noise — establishes the floor before we run on the experiment branches.

🤖 Generated with Claude Code

The pcap-only bench was previously a one-off that captured `p1_resolve`
across `utoo` and `bun`, and assumed the project tree was already
cloned by `pm-bench-phases.sh` running in the same job. That gave us
metadata fan-out, but install-phase regressions (#2902 / #2903 /
#2904 / #2905 σ widening on `p0_full_cold`) live in the tarball
download path, not in resolve.

This commit makes the pcap bench self-contained and covers both
phases for three PMs:

- Self-clone the project if `$PROJECT_DIR` is missing (mirrors
  `pm-bench-phases.sh`), so this script runs as a standalone CI job.
- Add a `<pm>-install` capture per PM: lock pre-existing,
  `cache + node_modules` wiped, then `<pm> install`. This is the
  cold-tarball-download phase where the σ-widening lives.
- Add `utoo-next` as a third PM: built upstream by `build-linux`'s
  bench-baseline step (now also gated on `pm-bench-pcap`), downloaded
  via the same artifact path as `bench-phases-linux`. Skipped in
  local runs where `$UTOO_NEXT_BIN` is unset.

Workflow change:

- `pm-bench-pcap-linux` now downloads the `utoo-next-linux-x64`
  artifact and exports `UTOO_NEXT_BIN` exactly like
  `bench-phases-linux` does.
- `Build next branch utoo` and `Upload utoo-next binary` steps in
  `build-linux` now also fire for `inputs.target == 'pm-bench-pcap'`,
  not only `pm-bench-phases`.

Outputs in `/tmp/pm-bench-pcap`:

  dns.txt
  utoo-{resolve,install}.{pcap,log}
  utoo-next-{resolve,install}.{pcap,log}   (when UTOO_NEXT_BIN set)
  bun-{resolve,install}.{pcap,log}

Drives the analysis of whether the install hot-path's increased
concurrency (FuturesUnordered streaming, zero-copy tar, TTY-gate)
saturates outbound TCP and starves the download path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the bench/pm-bench-pcap.sh script to support capturing network traces for both the resolve and install phases across different package managers, including a new baseline option for utoo-next. It introduces a run_pm_phases function to manage the lifecycle of these phases and ensures a cold environment by clearing caches. The review feedback recommends using Bash arrays to store common command-line arguments for bun and utoo to improve maintainability and reduce duplication.

Comment thread bench/pm-bench-pcap.sh
Comment on lines +99 to +105
BUN_INSTALL_CACHE_DIR="$cache_dir" \
capture_one "${pm_name}-resolve" \
"$pm_bin" install --lockfile-only --registry="$REGISTRY"
rm -rf "$cache_dir" node_modules
BUN_INSTALL_CACHE_DIR="$cache_dir" \
capture_one "${pm_name}-install" \
"$pm_bin" install --registry="$REGISTRY"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The --registry argument and the BUN_INSTALL_CACHE_DIR environment variable prefix are duplicated for the bun commands. To improve maintainability and reduce redundancy, you could extract the common argument into an array. This will make future changes to arguments easier.

Suggested change
BUN_INSTALL_CACHE_DIR="$cache_dir" \
capture_one "${pm_name}-resolve" \
"$pm_bin" install --lockfile-only --registry="$REGISTRY"
rm -rf "$cache_dir" node_modules
BUN_INSTALL_CACHE_DIR="$cache_dir" \
capture_one "${pm_name}-install" \
"$pm_bin" install --registry="$REGISTRY"
local bun_args=(--registry="$REGISTRY")
BUN_INSTALL_CACHE_DIR="$cache_dir" \
capture_one "${pm_name}-resolve" \
"$pm_bin" install --lockfile-only "${bun_args[@]}"
rm -rf "$cache_dir" node_modules
BUN_INSTALL_CACHE_DIR="$cache_dir" \
capture_one "${pm_name}-install" \
"$pm_bin" install "${bun_args[@]}"

Comment thread bench/pm-bench-pcap.sh
Comment on lines +107 to +111
capture_one "${pm_name}-resolve" \
"$pm_bin" deps --registry="$REGISTRY" --cache-dir="$cache_dir"
rm -rf "$cache_dir" node_modules
capture_one "${pm_name}-install" \
"$pm_bin" install --registry="$REGISTRY" --cache-dir="$cache_dir"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The arguments --registry and --cache-dir are duplicated across the utoo commands. You can extract these common arguments into an array to improve maintainability and reduce redundancy. This makes it easier to modify arguments in the future.

Suggested change
capture_one "${pm_name}-resolve" \
"$pm_bin" deps --registry="$REGISTRY" --cache-dir="$cache_dir"
rm -rf "$cache_dir" node_modules
capture_one "${pm_name}-install" \
"$pm_bin" install --registry="$REGISTRY" --cache-dir="$cache_dir"
local utoo_args=(--registry="$REGISTRY" --cache-dir="$cache_dir")
capture_one "${pm_name}-resolve" \
"$pm_bin" deps "${utoo_args[@]}"
rm -rf "$cache_dir" node_modules
capture_one "${pm_name}-install" \
"$pm_bin" install "${utoo_args[@]}"

elrrrrrrr and others added 3 commits May 7, 2026 16:11
Building on the install-phase pcap capture from the previous commit,
post-process each .pcap with tshark to extract pre-TLS metrics that
directly probe the "install greediness starves download" hypothesis
without needing TLS session-key dumping:

  zero_windows    — receive buffer full → server paused. Direct evidence
                    that the app's tokio runtime is not draining the
                    socket fast enough between extracts.
  retransmits     — server resent because ACK was late. Indirect
                    evidence of receive-side stall.
  duplicate_acks  — receiver re-sent ACK because it perceived a gap.
  stream_gap_*    — inter-packet gap distribution per TCP stream
                    (p50 / p99 / max in microseconds). p99 / max measure
                    the longest pause an active connection experienced —
                    if utoo shows multi-hundred-ms gaps where utoo-next
                    shows tens of ms, install is freezing the runtime
                    mid-download.

Per-capture summaries land at $PCAP_DIR/<name>.summary.json. They are
aggregated into a top-level summary.json via jq -s, so artifact
consumers can compare metrics across PMs without re-parsing the 100s
of MB of raw pcaps.

Single-pass tshark over the pcap with -T fields keeps cost bounded
to ~1 minute per 1 GB of capture; the full analysis pass runs after
all captures so it does not bleed into wall-clock measurement.

Workflow change:

  Install pcap tools step now also installs tshark + jq, with
  wireshark-common pre-seeded so tshark installs non-interactively
  (we only read existing pcaps, no setuid dumpcap needed).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The analysis pass aborted the whole job on the first .pcap because
the wall-time grep returned no match, and `set -eo pipefail` propagated
that exit-1 through `local x=$(grep | awk)` (the multi-line `local x;
x=$(...)` form does NOT mask the exit code, unlike `local x=$(...)`
on one line — bash gotcha).

Two-part fix:

1. Drop into `set +e` / `set +o pipefail` for the analysis function
   body. The metrics are diagnostic — one tshark hiccup or an empty
   log line should not nuke a 25-minute capture run. Strict mode is
   restored at the end of the function so the rest of the script
   keeps its safety net.

2. Replace `grep -oE | awk` with awk-only. awk returns 0 even when
   no record matches, so empty-result log files no longer trip
   pipefail. Same parse, fewer pipes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…iagnosis

The TCP-level analysis (zero-copy retx=123 vs baseline 4-18) gave
strong evidence that utoo's receiver runtime is under back-pressure
during install, but it doesn't tell us *why*. The leading hypothesis
is disk IO saturation: rayon's parallel `fs::create + write_all` over
80k+ files in the ant-design tarball burst can outrun GitHub Actions
runners' Azure-disk IOPS budget, blocking write threads → tokio
threads back up → socket buffers fill.

This commit adds an iostat-x sampler to each capture:

  capture_one() now spawns `iostat -x -y 1` in parallel with
  tcpdump, writing per-second device samples to
  $PCAP_DIR/<name>.iostat.txt. Both samplers are torn down with
  the workload command.

  analyze_pcap() parses the iostat log via column-position lookup
  (sysstat header row → column index map) and extracts:
    io_util_max_pct       — peak disk-busy percentage
    io_util_avg_pct       — average disk-busy percentage
    io_w_iops_max         — peak write IOPS
    io_w_kbs_max          — peak write throughput (kB/s)
    io_w_await_max_ms     — peak write queue wait (ms)
    io_samples            — sample count for sanity check

These six fields land in summary.json alongside the TCP metrics, so
artifact consumers can directly cross-correlate disk pressure with
TCP back-pressure within the same capture window.

The decision rule:
  * If io_util_max_pct stays high (>80%) on the experiment branch
    while baseline same-PM utoo-next stays low → install path is
    saturating disk and that's the mechanism.
  * If both branches show similar low %util, disk is not the
    bottleneck and we keep looking (e.g. CPU contention).

Workflow: apt install adds `sysstat` (iostat lives there). It is
preinstalled on ubuntu-latest images today, but pinning the
dependency makes future image rebuilds resilient.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@elrrrrrrr elrrrrrrr marked this pull request as draft May 9, 2026 15:09
@elrrrrrrr
Copy link
Copy Markdown
Contributor Author

pcap tooling rolled into #2924 stack (ci(pcap): install/preload/manifest captures). Archived.

@elrrrrrrr elrrrrrrr closed this May 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant