Skip to content

feat(env): integrate CodeDebug environment and vLLM inference stabilization#3448

Open
RUFFY-369 wants to merge 19 commits intoNousResearch:mainfrom
RUFFY-369:feat/code-debug-agent-env
Open

feat(env): integrate CodeDebug environment and vLLM inference stabilization#3448
RUFFY-369 wants to merge 19 commits intoNousResearch:mainfrom
RUFFY-369:feat/code-debug-agent-env

Conversation

@RUFFY-369
Copy link
Copy Markdown

@RUFFY-369 RUFFY-369 commented Mar 27, 2026

Note

Research Context: This PR integrates the physical environment provided in atropos (PR #421) and establishes the foundation for the MT-GRPO reward infrastructure in PR #3451.

What does this PR do?

This PR integrates the CodeDebug environment for agentic reasoning and introduces a Universal Tool Strategy to stabilize inference on vLLM backends.

It solves two primary problems:

  1. Lack of Agentic Debugging Benchmarks
    Provides a high-fidelity sandbox where agents iteratively debug code using real terminal and file tools.

  2. vLLM Tool Parsing Bugs
    Implements a robust client-side tool parsing fallback in model_tools.py and environments/agent_loop.py to bypass known bugs in vLLM 0.6.5 that cause 400/500 errors during complex tool-use sessions.


Related Issue

Fixes # (Initial integration of debugging environment)


Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 🔒 Security fix
  • 📝 Documentation update
  • ✅ Tests (adding or improving test coverage)
  • ♻️ Refactor (no behavior change)
  • 🎯 New skill (bundled or hub)

Changes Made

  • environments/agent_loop.py: Added client-side tool parsing loop and atropos_inhibit_tools support
  • model_tools.py: Added parse_tool_calls_from_text (regex-based fallback parser)
  • environments/code_debug_env/code_debug_env.py: Environment logic based on HumanEvalFix
  • environments/code_debug_env/default.yaml: Configuration for agent prompts and tool inhibition
  • environments/code_debug_env/README.md: Usage and stabilization documentation

How to Test

  1. Rollout Verification

    python environments/code_debug_env/code_debug_env.py process
  2. vLLM Stability

    • Verify that complex tool calls are correctly extracted from <tool_code> tags
    • Confirm fallback parsing works when server-side parsing fails
  3. Mock Trajectories

    • Verify that tool calls (terminal/file) are correctly dispatched to the sandbox
    • Ensure multi-step debugging runs complete without errors

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix/feature (no unrelated commits)
  • I've run pytest tests/ -q and all tests pass (Note: system-level version conflict in env, but rollout verified)
  • I've added tests for my changes (required for bug fixes, strongly encouraged for features)
  • I've tested on my platform: Ubuntu 22.04

Documentation & Housekeeping

  • I've updated relevant documentation (README, docs/, docstrings)
  • I've updated cli-config.yaml.example — or N/A
  • I've updated CONTRIBUTING.md or AGENTS.md — or N/A
  • I've considered cross-platform impact (Windows, macOS) — or N/A
  • I've updated tool descriptions/schemas if I changed tool behavior — or N/A

For New Skills

N/A — This is an RL training environment integration.


Screenshots / Logs

  • Environment successfully executes multi-turn debugging sessions
  • Real terminal feedback verified
  • Running on Port 9001

cc @teknium1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant