feat: API performance tracking and final infra integration by RUFFY-369 · Pull Request #430 · NousResearch/atropos

RUFFY-369 · 2026-03-30T21:08:01Z

PR Type

Non-Environment PR - Complete Description, Related Issues & Type of Change sections

📝 General Information

Description

This PR unifies the entire stabilization sprint into BaseEnv. The main addition is a high-resolution APIPerformanceTracker to monitor throughput and latency bottlenecks between the trainer and inference nodes.

It tracks rolling p50/p95/p99 latencies and items_per_sec throughput. I also fixed a critical bug in BaseEnv.wandb_log where metrics from multiple servers were being overwritten instead of aggregated. This branch was the final one verified on Vast.ai and is confirmed compatible with downstream hermes-agent environments.

Related Issues

Part of [Enhancement] RL Training Infrastructure Stabilization & Observability #431 (RL Infrastructure Enhancements)
Depends on feat: reward ensembles and inter-rater reliability metrics #426, feat: online reward normalization (Welford’s algorithm) #427, feat: difficulty based curriculum sampling strategy #428, feat: numerical verification for RL distribution health #429

Type of Change

Bug fix (fixed multi-server wandb logging aggregation)
New feature (non-breaking change which adds functionality)
This change requires a documentation update

✅ Developer & Reviewer Checklist

Code follows project style (black, isort, flake8 pass with pre-commit)
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
My changes generate no new warnings
New and existing unit tests pass locally with my changes (Verified on RTX 3090)
Docstrings added for all new public classes / functions
If .env vars required, did you add it to the .env.example in repo root? (N/A)

…ability Add EnsembleReward to atroposlib/envs/reward_fns/ with: - Multiple aggregation strategies: mean, median, min, majority_vote - Krippendorff's alpha inter-rater reliability metric - Per-item disagreement tracking for reward hacking detection - Full integration with RewardRegistry 17/17 tests passing.

…lity Add RewardNormalizer to atroposlib/envs/ with: - Welford's online algorithm for running mean/variance (no data storage) - Z-score and min-max normalization modes - Configurable reward clipping and warmup period - Checkpoint save/load support - Opt-in integration in BaseEnv via 3 new config fields - WandB metrics for normalization statistics 21/21 tests passing.

Add CurriculumScheduler to atroposlib/envs/ with: - EMA-based per-item difficulty tracking from reward signals - Quantile-based difficulty binning (configurable N bins) - Three sampling strategies: uniform, easy_first, competence_based - Competence-based strategy cites Platanios et al. 2019 - Opt-in integration in BaseEnv via 3 config fields - WandB metrics for difficulty distribution tracking - Checkpoint save/load support 22/22 tests passing.

Add APIPerformanceTracker to atroposlib/utils/ with: - Rolling window latency stats (p50/p95/p99) - Throughput monitoring (items/sec, requests/sec) - Compression ratio and payload size tracking - Automatic slow-request warnings - Integration in BaseEnv (init, _send_scored_data_to_api, wandb_log) 7/7 tests passing.

…zation

…integration

…tion

for more information, see https://pre-commit.ci

RUFFY-369 and others added 14 commits March 28, 2026 03:22

style: fix linting and imports in curriculum scheduler

38731e0

fix: pin antlr4-python3-runtime for compatibility

95c3e49

style: fix lints and pin dependencies for reward normalization

8a3a582

style: fix lints and back-port stabilization fixes for trainer optimi…

dedea30

…zation

style: resolve merge conflicts and back-port stabilization fixes for …

3d08cdf

…integration

chore: merge ensemble for integration

76e8bf7

style: finalize integration of all RL features with stabilization fixes

c90a74e

merge ensemble

808a335

Merge branch 'NousResearch:main' into feat/trainer-inference-optimiza…

6825974

…tion

[pre-commit.ci] auto fixes from pre-commit.com hooks

271a242

for more information, see https://pre-commit.ci

RUFFY-369 mentioned this pull request Mar 30, 2026

[Enhancement] RL Training Infrastructure Stabilization & Observability #431

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: API performance tracking and final infra integration#430

feat: API performance tracking and final infra integration#430
RUFFY-369 wants to merge 14 commits intoNousResearch:mainfrom
RUFFY-369:feat/trainer-inference-optimization

RUFFY-369 commented Mar 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

RUFFY-369 commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Type

📝 General Information

Description

Related Issues

Type of Change

✅ Developer & Reviewer Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

RUFFY-369 commented Mar 30, 2026 •

edited

Loading