Skip to content

perf(pm): Plan B — inline mb_fetch + per-iter spawn_blocking (vs PR #2920 channel)#2922

Closed
elrrrrrrr wants to merge 1 commit into
nextfrom
perf/inline-graph-spawn-blocking
Closed

perf(pm): Plan B — inline mb_fetch + per-iter spawn_blocking (vs PR #2920 channel)#2922
elrrrrrrr wants to merge 1 commit into
nextfrom
perf/inline-graph-spawn-blocking

Conversation

@elrrrrrrr
Copy link
Copy Markdown
Contributor

A/B branch from origin/next testing whether the channel rewrite in PR #2916/#2920 was overkill. Brings in PR #2920's full work but rewrites the hot loop to remove channels: graph state owned by main task, shipped into spawn_blocking per iteration. Tradeoff: per-iter dispatch overhead vs simpler code. Bench result will determine if channel arch is justified.

Generated with Claude Code.

… B A/B

A/B branch from origin/next to test whether the channel-based
fetch+graph separation in PR #2916/#2920 was overkill.

Brings in PR #2920's full work as baseline (mb_fetch_with_graph +
install integration + alias normalization + worker_threads ≥ 4 +
manifest-bench / preload-bench helpers) but rewrites mb_fetch_with_graph's
hot loop to NOT use channels:

- No fetch_tx / specs_tx channels, no long-lived graph_worker task
- Graph state (graph + edge_targets + seen_specs) owned by main task
- After each fetched FetchOutcome, main loop ships state into a
  \`tokio::task::spawn_blocking\` closure that mutates it for one
  event, returning the updated state + new pending specs
- Main loop awaits the spawn_blocking, gets state back, refills futs,
  continues

Tradeoffs vs PR #2920:
- Pro: simpler — no channel bookkeeping, no in_flight_graph counter,
  no \`tokio::select!\` biased on specs_rx vs futs.next()
- Con: per-iter spawn_blocking dispatch overhead (~tens of us × ~4647
  iters on ant-design = potentially 100s of ms)
- Con: serializes fetch loop and graph mutation per event — main loop
  doesn't drive futs while spawn_blocking is awaited

If this branch's bench (linux + mac) approximates PR #2920's p0 σ
collapse and mean numbers, the channel rewrite was overkill and the
\`worker_threads + spawn_blocking\` fix could have shipped on its
own. If perf is materially worse, the channel architecture's
parallelism wins justify it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@elrrrrrrr elrrrrrrr added the benchmark Run pm-bench on PR label May 10, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements performance optimizations for manifest fetching and dependency resolution, notably introducing a 'folded' preload and graph build phase for lockfile-only workloads and adding comprehensive instrumentation for timing analysis. It also includes new benchmarking crates and tunes runtime configurations for thread pools and concurrency. Feedback identifies a breaking change in package name validation that affects unnamed root projects and a potential O(N^2) complexity issue in the edge target processing loop. Further improvements were suggested to avoid swallowing manifest parse errors and to simplify redundant success-tracking logic.

Comment on lines +166 to +168
if pkg.name.is_empty() {
anyhow::bail!("Failed to get package name from package.json");
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This mandatory check for a non-empty name in PackageInfo::from_package_json is a breaking change for projects with unnamed root package.json files (e.g., those used only for workspaces or scripts). Previously, this was explicitly allowed as noted in the removed comments and tests. If validation is required for specific commands like publish or link, it should be performed in those specific contexts rather than at the base loading level. Additionally, per the general rules, the error message should follow the format 'requires a non-empty X field'.

References
  1. When using '#[serde(default)]' in Rust, validation error messages for required fields should indicate that the field must be non-empty (e.g., 'requires a non-empty X field') because missing fields are indistinguishable from empty ones.
  2. For package linking operations (similar to 'npm link'), only the package name should be validated as required. Do not enforce a version check if the version is not used in the linking logic.

Comment on lines +283 to +288
let parsed = tokio::task::spawn_blocking(move || {
parse_combined(raw_arc, &real_spec_for_parse, peer)
})
.await
.ok()
.flatten();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The parse result from spawn_blocking is flattened and defaults to an empty vector of transitives on failure. This silently swallows manifest parse errors (e.g., malformed JSON or missing required fields), which can lead to incomplete dependency graphs without any indication to the user. These errors should be captured and reported as failures in MbFetchStats or returned as a Result to the main loop.

Comment on lines +464 to +472
if out.transitives.is_empty() && out.fetched {
// Empty result from a fetch is ambiguous (no transitives
// OR a fetch/parse failure). Track conservatively as
// success — the FETCH_TIMINGS-equivalent counter is
// omitted in this path on purpose to keep the future
// body lean.
stats.success += 1;
} else if out.fetched {
stats.success += 1;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic here is redundant as both branches perform the same action (stats.success += 1). It can be simplified to a single check on out.fetched.

        if out.fetched {
            stats.success += 1;
        }

Comment on lines +890 to +894
let primary_keys: Vec<(String, String)> = edge_targets
.keys()
.filter(|(n, _)| n == &event_name)
.cloned()
.collect();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Iterating over all keys in edge_targets to filter by event_name results in O(N) complexity per fetch result, leading to O(N^2) overall complexity for the resolution process. Given that this is a hot loop intended for performance, consider restructuring edge_targets (e.g., using a nested HashMap keyed by package name) to allow for O(1) lookups of pending specs for a given package.

@github-actions
Copy link
Copy Markdown

📊 pm-bench-phases · f0a0a73 · linux (ubuntu-latest)

Workflow run — ant-design

PMs: utoo (this branch) · utoo-npm (latest published) · bun (latest)

npmjs.org

p0_full_cold

PM wall ±σ user sys RSS pgMinor
bun 7.58s 0.20s 10.33s 6.35s 615M 281.8K
utoo-next 7.28s 0.26s 11.00s 7.96s 992M 120.6K
utoo-npm 9.06s 1.32s 11.91s 8.81s 1.43G 175.2K
utoo 8.28s 0.35s 11.56s 8.41s 1.58G 209.2K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 22.3K 18.7K 1.20G 7M 1.89G 1.77G 1M
utoo-next 136.4K 105.7K 1.17G 5M 1.73G 1.73G 2M
utoo-npm 171.7K 137.0K 1.18G 6M 1.73G 1.73G 2M
utoo 158.1K 153.0K 1.18G 6M 1.73G 1.73G 2M

p1_resolve

PM wall ±σ user sys RSS pgMinor
bun 2.31s 0.18s 4.40s 0.86s 496M 184.3K
utoo-next 3.14s 0.15s 5.64s 1.50s 613M 84.9K
utoo-npm 3.08s 0.07s 5.57s 1.50s 610M 83.4K
utoo 3.35s 0.13s 5.79s 1.02s 927M 132.3K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 12.0K 4.4K 204M 3M 107M - 1M
utoo-next 83.5K 123.9K 201M 3M 7M 3M 2M
utoo-npm 83.9K 121.6K 201M 3M 7M 3M 2M
utoo 68.8K 7.5K 202M 3M - 3M 2M

p3_cold_install

PM wall ±σ user sys RSS pgMinor
bun 5.70s 0.24s 5.99s 6.35s 569M 197.8K
utoo-next 5.42s 0.11s 5.12s 6.95s 531M 66.2K
utoo-npm 8.36s 2.04s 5.93s 7.72s 945M 123.6K
utoo 5.09s 0.14s 5.03s 6.78s 617M 74.9K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 9.5K 7.9K 1.00G 5M 1.78G 1.78G 1M
utoo-next 112.5K 52.5K 1002M 3M 1.73G 1.73G 2M
utoo-npm 143.7K 88.3K 1002M 4M 1.73G 1.73G 2M
utoo 92.3K 76.8K 1001M 2M 1.73G 1.73G 2M

p4_warm_link

PM wall ±σ user sys RSS pgMinor
bun 2.11s 0.08s 0.15s 1.22s 140M 33.2K
utoo-next 2.06s 0.07s 0.53s 2.42s 80M 18.7K
utoo-npm 1.93s 0.07s 0.57s 2.46s 84M 19.2K
utoo 1.88s 0.01s 0.52s 2.39s 79M 18.2K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 395 17 5M 33K 1.93G 1.77G 1M
utoo-next 41.4K 15.7K 2K 16K 1.73G 1.73G 2M
utoo-npm 47.1K 17.7K 16K 12K 1.73G 1.73G 2M
utoo 40.7K 15.2K 4K 9K 1.73G 1.73G 2M

npmmirror.com

p0_full_cold

PM wall ±σ user sys RSS pgMinor
bun 26.85s 0.37s 10.04s 7.00s 537M 377.5K
utoo-next 40.54s 12.47s 7.61s 8.67s 499M 80.5K
utoo-npm 32.38s 2.41s 8.30s 9.48s 759M 137.3K
utoo 27.21s 1.36s 12.73s 9.34s 1.48G 206.7K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 162.0K 3.8K 1.16G 17M 1.89G 1.77G 2M
utoo-next 236.2K 87.5K 1.01G 10M 1.73G 1.73G 2M
utoo-npm 263.7K 141.8K 1.02G 10M 1.73G 1.73G 2M
utoo 267.2K 153.5K 1.15G 12M 1.73G 1.73G 2M

p1_resolve

PM wall ±σ user sys RSS pgMinor
bun 3.44s 0.11s 4.03s 1.01s 507M 221.0K
utoo-next 4.83s 0.03s 1.30s 0.68s 87M 21.2K
utoo-npm 4.82s 0.06s 1.21s 0.66s 86M 21.5K
utoo 3.56s 0.03s 5.85s 0.99s 896M 116.8K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 27.0K 3.3K 155M 4M 109M - 2M
utoo-next 47.3K 22.7K 16M 2M - 3M 2M
utoo-npm 47.2K 28.3K 16M 2M - 3M 2M
utoo 64.1K 5.4K 151M 3M - 3M 3M

p3_cold_install

PM wall ±σ user sys RSS pgMinor
bun 22.85s 1.33s 6.37s 6.22s 230M 86.1K
utoo-next 40.73s 4.70s 6.23s 7.75s 413M 60.8K
utoo-npm 39.45s 3.20s 6.96s 8.71s 649M 105.9K
utoo 33.78s 2.22s 6.20s 7.79s 491M 84.0K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 136.2K 2.6K 1022M 13M 1.74G 1.74G 2M
utoo-next 191.8K 42.8K 1.01G 8M 1.73G 1.73G 3M
utoo-npm 217.4K 77.2K 1021M 9M 1.73G 1.73G 3M
utoo 185.4K 137.0K 1013M 8M 1.73G 1.73G 3M

p4_warm_link

PM wall ±σ user sys RSS pgMinor
bun 2.07s 0.12s 0.15s 1.19s 139M 32.1K
utoo-next 1.86s 0.09s 0.57s 2.37s 83M 18.7K
utoo-npm 1.85s 0.05s 0.56s 2.46s 83M 19.4K
utoo 1.84s 0.07s 0.54s 2.38s 81M 18.3K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 681 33 2M 49K 1.87G 1.77G 2M
utoo-next 43.8K 17.3K 25K 11K 1.73G 1.72G 3M
utoo-npm 49.8K 18.5K 22K 5K 1.73G 1.72G 3M
utoo 41.4K 15.4K 25K 12K 1.73G 1.72G 3M

@github-actions
Copy link
Copy Markdown

📊 pm-bench-phases · f0a0a73 · mac (macos-latest)

Workflow run — ant-design

PMs: utoo (this branch) · utoo-npm (latest published) · bun (latest)

npmjs.org

p0_full_cold

PM wall ±σ user sys RSS pgMinor
bun 19.32s 3.40s 7.03s 20.73s 786M 50.8K
utoo-npm 26.94s 1.24s 11.77s 29.87s 1004M 100.7K
utoo 21.02s 3.79s 10.37s 24.25s 893M 67.5K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 16.2K 140.7K - - 1.80G 1.94G 1M
utoo-npm 13.3K 325.7K - - 1.67G 1.87G 2M
utoo 14.4K 332.3K - - 1.64G 1.91G 3M

p1_resolve

PM wall ±σ user sys RSS pgMinor
bun 2.14s 0.14s 2.65s 1.21s 500M 32.5K
utoo-npm 3.91s 0.65s 4.55s 2.64s 598M 40.4K
utoo 3.41s 0.31s 4.33s 1.14s 613M 40.8K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 33 20.8K - - 112M - 1M
utoo-npm 10 151.6K - - 27M 3M 3M
utoo 7 65.3K - - - 3M 3M

p3_cold_install

PM wall ±σ user sys RSS pgMinor
bun 19.66s 0.50s 4.23s 22.05s 555M 36.1K
utoo-npm 15.78s 2.10s 4.54s 20.35s 758M 87.3K
utoo 14.22s 2.69s 3.88s 19.85s 462M 31.1K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 4.1K 127.9K - - 1.67G 1.94G 1M
utoo-npm 1.5K 227.7K - - 1.64G 1.87G 2M
utoo 1.6K 252.6K - - 1.64G 1.87G 2M

p4_warm_link

PM wall ±σ user sys RSS pgMinor
bun 7.13s 0.53s 0.15s 3.18s 47M 3.6K
utoo-npm 7.53s 0.35s 0.75s 5.28s 100M 7.3K
utoo 7.77s 0.24s 0.55s 5.17s 80M 6.1K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 15.4K 689 - - 1.78G 1.95G 1M
utoo-npm 12.6K 76.9K - - 1.64G 1.89G 3M
utoo 14.5K 69.6K - - 1.64G 1.89G 3M

npmmirror.com

p0_full_cold

PM wall ±σ user sys RSS pgMinor
bun 32.52s 5.72s 7.10s 21.08s 566M 36.7K
utoo-npm 37.70s 15.46s 7.56s 22.75s 712M 76.2K
utoo 22.46s 0.93s 9.58s 19.92s 946M 67.2K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 14.7K 157.3K - - 1.82G 1.94G 2M
utoo-npm 3.9K 437.4K - - 1.64G 1.92G 3M
utoo 3.1K 419.6K - - 1.64G 1.88G 3M

p1_resolve

PM wall ±σ user sys RSS pgMinor
bun 4.04s 0.06s 3.00s 1.92s 513M 33.4K
utoo-npm 5.67s 0.19s 1.49s 0.84s 96M 6.9K
utoo 4.48s 0.75s 4.67s 1.35s 567M 38.2K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 38 27.8K - - 114M - 2M
utoo-npm 7 225.3K - - - 3M 2M
utoo 18 70.4K - - - 3M 3M

p3_cold_install

PM wall ±σ user sys RSS pgMinor
bun 14.81s 1.17s 3.61s 16.28s 357M 23.4K
utoo-npm 66.48s 49.25s 6.02s 22.28s 588M 74.3K
utoo 25.75s 0.30s 5.27s 21.96s 492M 32.8K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 2.1K 145.5K - - 1.67G 1.94G 2M
utoo-npm 2.1K 336.9K - - 1.64G 1.90G 3M
utoo 1.7K 336.7K - - 1.64G 1.90G 3M

p4_warm_link

PM wall ±σ user sys RSS pgMinor
bun 7.11s 0.47s 0.11s 2.89s 47M 3.6K
utoo-npm 5.76s 1.16s 0.53s 3.72s 96M 7.1K
utoo 4.77s 0.41s 0.41s 2.98s 87M 6.7K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 14.0K 664 - - 1.79G 1.94G 2M
utoo-npm 12.9K 68.5K - - 1.64G 1.89G 2M
utoo 14.9K 56.5K - - 1.64G 1.89G 2M

@elrrrrrrr
Copy link
Copy Markdown
Contributor Author

Plan B vs #2920 — channel-based approach won, see #2924.

@elrrrrrrr elrrrrrrr closed this May 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

benchmark Run pm-bench on PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant