[specdec_bench] Stratify --num_requests across categories by milesial · Pull Request #1389 · NVIDIA/Model-Optimizer

milesial · 2026-05-04T21:49:34Z

When the dataset has a category column with >1 distinct categories and --num_requests N is below the dataset size, take ceil(N / num_categories) rows from each category and round-robin interleave them so any prefix is balanced. Falls back to the existing range(N) slice when category metadata is absent or there's only one category.

Fixes a sampling bug where SPEED-Bench parquet files are sorted by category, so e.g. --num_requests 64 on throughput_8k pulls 64 high_entropy prompts and zero from low_entropy / mixed.

Summary by CodeRabbit

New Features
- SPEEDBench dataset truncation now performs deterministic, balanced stratified sampling across categories (round-robin) when multiple categories exist and a smaller sample size is requested.
- Falls back to simple prefix selection when no category or only one category is present, preserving prior behavior in that case.

When the dataset has a `category` column with >1 distinct categories and ``--num_requests N`` is below the dataset size, take ceil(N / num_categories) rows from each category and round-robin interleave them so any prefix is balanced. Falls back to the existing ``range(N)`` slice when category metadata is absent or there's only one category. Fixes a sampling bug where SPEED-Bench parquet files are sorted by category, so e.g. ``--num_requests 64`` on throughput_8k pulls 64 high_entropy prompts and zero from low_entropy / mixed. Signed-off-by: Alexandre Milesi <milesial@users.noreply.github.com>

coderabbitai · 2026-05-04T21:49:45Z

📝 Walkthrough

Walkthrough

SPEEDBench dataset loading now only truncates when num_samples is set and smaller than the dataset. When truncating, if a multi-valued category column exists it deterministically round-robins rows across categories to collect exactly n samples; otherwise it falls back to selecting the first n rows.

Changes

Stratified Dataset Sampling

Layer / File(s)	Summary
Precondition / Truncation Gate `examples/specdec_bench/specdec_bench/datasets/speed.py` (around lines 740–743)	Truncation is applied only when `self.num_samples` is not None and strictly less than `len(dataset)`; previous unconditional prefix truncation was removed.
Core Selection Logic `examples/specdec_bench/specdec_bench/datasets/speed.py` (new method lines ~745–773)	Adds `SPEEDBench._stratified_select(dataset: "Dataset", n: int)` that: if `category` is absent or contains a single distinct value, returns `dataset.select(range(n))`; otherwise builds per-category index lists in original order and deterministically collects exactly `n` rows by round-robin interleaving across categories until `n` indices are gathered.
Integration `examples/specdec_bench/specdec_bench/datasets/speed.py` (caller around lines 740–743)	`_load_dataset(...)` delegates to `self._stratified_select(dataset, self.num_samples)` when truncation is needed.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 6

✅ Passed checks (6 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically describes the main change: implementing stratified sampling of requests across dataset categories in the SPEED-Bench dataset loader.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns	✅ Passed	Pull request does not introduce any of the six critical security anti-patterns specified in SECURITY.md.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@examples/specdec_bench/specdec_bench/datasets/speed.py`:
- Around line 765-777: The interleaving logic can produce fewer than n items
when some categories have fewer than per_cat rows; update the loop that builds
interleaved (currently using per_cat and cat_samples) to backfill by repeatedly
cycling through cat_samples and appending the next available item from each
category until interleaved has length n or all samples are exhausted, i.e.,
maintain per-category indices (or pop from lists) and in a while
len(interleaved) < n loop iterate over cat_samples appending one item per
non-empty category, then return dataset.select(interleaved[:n]) to guarantee
num_samples.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: f07d27e0-5726-4c6e-ae35-bffedcb82b90

📥 Commits

Reviewing files that changed from the base of the PR and between 06ef935 and 9d05c40.

📒 Files selected for processing (1)

examples/specdec_bench/specdec_bench/datasets/speed.py

codecov · 2026-05-04T22:02:44Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.44%. Comparing base (06ef935) to head (29cd86f).
⚠️ Report is 6 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1389      +/-   ##
==========================================
+ Coverage   76.90%   77.44%   +0.54%     
==========================================
  Files         471      471              
  Lines       50565    50565              
==========================================
+ Hits        38885    39161     +276     
+ Misses      11680    11404     -276

Flag	Coverage Δ
examples	`41.55% <ø> (+0.89%)`	⬆️
unit	`52.80% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

h-guo18

Please take a look at CodeRabbit’s comment above. Other than that LGTM

Replace the two-step (per_cat clamp + interleave) selection with a single-pass round-robin walk over the per-category index lists. Smaller categories continue contributing until exhausted; larger categories then fill the remainder. This guarantees exactly ``n`` rows whenever ``n`` does not exceed the total dataset size, where the previous code could under-deliver if any category had fewer than ``ceil(n / num_categories)`` rows. The first ``n`` rows are still balanced (drop the same ``ceil`` / ``per_cat`` quantity); the only change is what happens past the exhaustion point of the smallest category. ``math`` is no longer needed. Signed-off-by: Alexandre Milesi <milesial@users.noreply.github.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@examples/specdec_bench/specdec_bench/datasets/speed.py`:
- Around line 745-772: In _stratified_select, handle non-positive n early: if n
<= 0 return dataset.select([]) (or an empty index sequence) before any category
processing so the round-robin loop never appends indices for n==0 or negative n;
locate the function _stratified_select and add this early guard immediately
after the docstring (before checking "category" in dataset.column_names) so all
subsequent logic assumes n > 0.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 38099a4c-3298-464c-901c-7a504580c435

📥 Commits

Reviewing files that changed from the base of the PR and between 9d05c40 and 29cd86f.

📒 Files selected for processing (1)

examples/specdec_bench/specdec_bench/datasets/speed.py

milesial requested a review from a team as a code owner May 4, 2026 21:49

milesial requested a review from ChenhanYu May 4, 2026 21:49

coderabbitai Bot reviewed May 4, 2026

View reviewed changes

Comment thread examples/specdec_bench/specdec_bench/datasets/speed.py Outdated

h-guo18 approved these changes May 4, 2026

View reviewed changes

coderabbitai Bot reviewed May 4, 2026

View reviewed changes

Comment thread examples/specdec_bench/specdec_bench/datasets/speed.py

kevalmorabia97 approved these changes May 5, 2026

View reviewed changes

kevalmorabia97 added the cherry-pick-0.44.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc label May 5, 2026

h-guo18 merged commit 7097a69 into NVIDIA:main May 5, 2026
50 of 52 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[specdec_bench] Stratify --num_requests across categories#1389

[specdec_bench] Stratify --num_requests across categories#1389
h-guo18 merged 2 commits intoNVIDIA:mainfrom
milesial:specdec-stratify-num-requests

milesial commented May 4, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 4, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

codecov Bot commented May 4, 2026 •

edited

Loading

Uh oh!

h-guo18 left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

milesial commented May 4, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

h-guo18 left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

milesial commented May 4, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 4, 2026 •

edited

Loading

codecov Bot commented May 4, 2026 •

edited

Loading