Skip to content

[specdec_bench] Stratify --num_requests across categories#1389

Merged
h-guo18 merged 2 commits intoNVIDIA:mainfrom
milesial:specdec-stratify-num-requests
May 5, 2026
Merged

[specdec_bench] Stratify --num_requests across categories#1389
h-guo18 merged 2 commits intoNVIDIA:mainfrom
milesial:specdec-stratify-num-requests

Conversation

@milesial
Copy link
Copy Markdown
Contributor

@milesial milesial commented May 4, 2026

When the dataset has a category column with >1 distinct categories and --num_requests N is below the dataset size, take ceil(N / num_categories) rows from each category and round-robin interleave them so any prefix is balanced. Falls back to the existing range(N) slice when category metadata is absent or there's only one category.

Fixes a sampling bug where SPEED-Bench parquet files are sorted by category, so e.g. --num_requests 64 on throughput_8k pulls 64 high_entropy prompts and zero from low_entropy / mixed.

Summary by CodeRabbit

  • New Features
    • SPEEDBench dataset truncation now performs deterministic, balanced stratified sampling across categories (round-robin) when multiple categories exist and a smaller sample size is requested.
    • Falls back to simple prefix selection when no category or only one category is present, preserving prior behavior in that case.

When the dataset has a `category` column with >1 distinct categories
and ``--num_requests N`` is below the dataset size, take ceil(N /
num_categories) rows from each category and round-robin interleave
them so any prefix is balanced. Falls back to the existing
``range(N)`` slice when category metadata is absent or there's only
one category.

Fixes a sampling bug where SPEED-Bench parquet files are sorted by
category, so e.g. ``--num_requests 64`` on throughput_8k pulls 64
high_entropy prompts and zero from low_entropy / mixed.

Signed-off-by: Alexandre Milesi <milesial@users.noreply.github.com>
@milesial milesial requested a review from a team as a code owner May 4, 2026 21:49
@milesial milesial requested a review from ChenhanYu May 4, 2026 21:49
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 4, 2026

📝 Walkthrough

Walkthrough

SPEEDBench dataset loading now only truncates when num_samples is set and smaller than the dataset. When truncating, if a multi-valued category column exists it deterministically round-robins rows across categories to collect exactly n samples; otherwise it falls back to selecting the first n rows.

Changes

Stratified Dataset Sampling

Layer / File(s) Summary
Precondition / Truncation Gate
examples/specdec_bench/specdec_bench/datasets/speed.py (around lines 740–743)
Truncation is applied only when self.num_samples is not None and strictly less than len(dataset); previous unconditional prefix truncation was removed.
Core Selection Logic
examples/specdec_bench/specdec_bench/datasets/speed.py (new method lines ~745–773)
Adds SPEEDBench._stratified_select(dataset: "Dataset", n: int) that: if category is absent or contains a single distinct value, returns dataset.select(range(n)); otherwise builds per-category index lists in original order and deterministically collects exactly n rows by round-robin interleaving across categories until n indices are gathered.
Integration
examples/specdec_bench/specdec_bench/datasets/speed.py (caller around lines 740–743)
_load_dataset(...) delegates to self._stratified_select(dataset, self.num_samples) when truncation is needed.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 6
✅ Passed checks (6 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: implementing stratified sampling of requests across dataset categories in the SPEED-Bench dataset loader.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns ✅ Passed Pull request does not introduce any of the six critical security anti-patterns specified in SECURITY.md.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@examples/specdec_bench/specdec_bench/datasets/speed.py`:
- Around line 765-777: The interleaving logic can produce fewer than n items
when some categories have fewer than per_cat rows; update the loop that builds
interleaved (currently using per_cat and cat_samples) to backfill by repeatedly
cycling through cat_samples and appending the next available item from each
category until interleaved has length n or all samples are exhausted, i.e.,
maintain per-category indices (or pop from lists) and in a while
len(interleaved) < n loop iterate over cat_samples appending one item per
non-empty category, then return dataset.select(interleaved[:n]) to guarantee
num_samples.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: f07d27e0-5726-4c6e-ae35-bffedcb82b90

📥 Commits

Reviewing files that changed from the base of the PR and between 06ef935 and 9d05c40.

📒 Files selected for processing (1)
  • examples/specdec_bench/specdec_bench/datasets/speed.py

Comment thread examples/specdec_bench/specdec_bench/datasets/speed.py Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented May 4, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.44%. Comparing base (06ef935) to head (29cd86f).
⚠️ Report is 6 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1389      +/-   ##
==========================================
+ Coverage   76.90%   77.44%   +0.54%     
==========================================
  Files         471      471              
  Lines       50565    50565              
==========================================
+ Hits        38885    39161     +276     
+ Misses      11680    11404     -276     
Flag Coverage Δ
examples 41.55% <ø> (+0.89%) ⬆️
unit 52.80% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@h-guo18 h-guo18 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take a look at CodeRabbit’s comment above. Other than that LGTM

Replace the two-step (per_cat clamp + interleave) selection with a
single-pass round-robin walk over the per-category index lists.
Smaller categories continue contributing until exhausted; larger
categories then fill the remainder. This guarantees exactly ``n``
rows whenever ``n`` does not exceed the total dataset size, where
the previous code could under-deliver if any category had fewer
than ``ceil(n / num_categories)`` rows.

The first ``n`` rows are still balanced (drop the same ``ceil`` /
``per_cat`` quantity); the only change is what happens past the
exhaustion point of the smallest category. ``math`` is no longer
needed.

Signed-off-by: Alexandre Milesi <milesial@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@examples/specdec_bench/specdec_bench/datasets/speed.py`:
- Around line 745-772: In _stratified_select, handle non-positive n early: if n
<= 0 return dataset.select([]) (or an empty index sequence) before any category
processing so the round-robin loop never appends indices for n==0 or negative n;
locate the function _stratified_select and add this early guard immediately
after the docstring (before checking "category" in dataset.column_names) so all
subsequent logic assumes n > 0.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 38099a4c-3298-464c-901c-7a504580c435

📥 Commits

Reviewing files that changed from the base of the PR and between 9d05c40 and 29cd86f.

📒 Files selected for processing (1)
  • examples/specdec_bench/specdec_bench/datasets/speed.py

Comment thread examples/specdec_bench/specdec_bench/datasets/speed.py
@kevalmorabia97 kevalmorabia97 added the cherry-pick-0.44.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc label May 5, 2026
@h-guo18 h-guo18 merged commit 7097a69 into NVIDIA:main May 5, 2026
50 of 52 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cherry-pick-0.44.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants