Skip to content

feat: API performance tracking and final infra integration#430

Open
RUFFY-369 wants to merge 14 commits intoNousResearch:mainfrom
RUFFY-369:feat/trainer-inference-optimization
Open

feat: API performance tracking and final infra integration#430
RUFFY-369 wants to merge 14 commits intoNousResearch:mainfrom
RUFFY-369:feat/trainer-inference-optimization

Conversation

@RUFFY-369
Copy link
Copy Markdown

@RUFFY-369 RUFFY-369 commented Mar 30, 2026

PR Type

  • Non-Environment PR - Complete Description, Related Issues & Type of Change sections

📝 General Information

Description

This PR unifies the entire stabilization sprint into BaseEnv. The main addition is a high-resolution APIPerformanceTracker to monitor throughput and latency bottlenecks between the trainer and inference nodes.

It tracks rolling p50/p95/p99 latencies and items_per_sec throughput. I also fixed a critical bug in BaseEnv.wandb_log where metrics from multiple servers were being overwritten instead of aggregated. This branch was the final one verified on Vast.ai and is confirmed compatible with downstream hermes-agent environments.

Related Issues

Type of Change

  • Bug fix (fixed multi-server wandb logging aggregation)
  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

✅ Developer & Reviewer Checklist

  • Code follows project style (black, isort, flake8 pass with pre-commit)
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings
  • New and existing unit tests pass locally with my changes (Verified on RTX 3090)
  • Docstrings added for all new public classes / functions
  • If .env vars required, did you add it to the .env.example in repo root? (N/A)

RUFFY-369 and others added 14 commits March 28, 2026 03:22
…ability

Add EnsembleReward to atroposlib/envs/reward_fns/ with:
- Multiple aggregation strategies: mean, median, min, majority_vote
- Krippendorff's alpha inter-rater reliability metric
- Per-item disagreement tracking for reward hacking detection
- Full integration with RewardRegistry

17/17 tests passing.
…lity

Add RewardNormalizer to atroposlib/envs/ with:
- Welford's online algorithm for running mean/variance (no data storage)
- Z-score and min-max normalization modes
- Configurable reward clipping and warmup period
- Checkpoint save/load support
- Opt-in integration in BaseEnv via 3 new config fields
- WandB metrics for normalization statistics

21/21 tests passing.
Add CurriculumScheduler to atroposlib/envs/ with:
- EMA-based per-item difficulty tracking from reward signals
- Quantile-based difficulty binning (configurable N bins)
- Three sampling strategies: uniform, easy_first, competence_based
- Competence-based strategy cites Platanios et al. 2019
- Opt-in integration in BaseEnv via 3 config fields
- WandB metrics for difficulty distribution tracking
- Checkpoint save/load support

22/22 tests passing.
Add APIPerformanceTracker to atroposlib/utils/ with:
- Rolling window latency stats (p50/p95/p99)
- Throughput monitoring (items/sec, requests/sec)
- Compression ratio and payload size tracking
- Automatic slow-request warnings
- Integration in BaseEnv (init, _send_scored_data_to_api, wandb_log)

7/7 tests passing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant