Skip to content

feat: numerical verification for RL distribution health#429

Open
RUFFY-369 wants to merge 4 commits intoNousResearch:mainfrom
RUFFY-369:feat/numerical-verification
Open

feat: numerical verification for RL distribution health#429
RUFFY-369 wants to merge 4 commits intoNousResearch:mainfrom
RUFFY-369:feat/numerical-verification

Conversation

@RUFFY-369
Copy link
Copy Markdown

@RUFFY-369 RUFFY-369 commented Mar 30, 2026

PR Type

  • Non-Environment PR - Complete Description, Related Issues & Type of Change sections

📝 General Information

Description

This is a hygiene/safety PR adding NumericalVerification utilities. It’s designed to proactively catch Dead Rewards (collapse) or Exploding Advantages (NaNs) in the training loop before they waste valuable GPU hours.

It generates a DistributionReport that logs and warns if the rewards look biased or collapsed. I’ve integrated this into the wandb_log loop so you can see distribution health directly on your dashboard in real-time.

Related Issues

Type of Change

  • New feature (non-breaking change which adds functionality)
  • Code refactor (no functional changes)

✅ Developer & Reviewer Checklist

  • Code follows project style (black, isort, flake8 pass with pre-commit)
  • I have performed a self-review of my own code
  • My changes generate no new warnings
  • New and existing unit tests pass locally with my changes (25/25 verified)
  • Docstrings added for all new public classes / functions
  • If .env vars required, did you add it to the .env.example in repo root? (N/A)

RUFFY-369 and others added 4 commits March 28, 2026 03:48
Add numerical_verification.py to atroposlib/utils/ with:
- verify_reward_determinism: N-run reproducibility check
- verify_advantage_stability: NaN/Inf/magnitude detection
- compare_fp_precision: FP32/FP16/BF16 divergence analysis
- verify_score_distribution: collapse/explosion/bias detection
- Structured report dataclasses with diagnostic summaries

25/25 tests passing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant