Skip to content

Add timeout and memory limits to CodeEnvironment execution#2208

Draft
taivu1998 wants to merge 1 commit intoNVIDIA-NeMo:mainfrom
taivu1998:tdv/issue-816-code-env-limits
Draft

Add timeout and memory limits to CodeEnvironment execution#2208
taivu1998 wants to merge 1 commit intoNVIDIA-NeMo:mainfrom
taivu1998:tdv/issue-816-code-env-limits

Conversation

@taivu1998
Copy link
Copy Markdown

Summary

This PR adds execution limits to CodeEnvironment so coding rollouts can constrain untrusted code without taking down the Ray worker.

Closes #816.

What Changed

  • added optional environment-level defaults for default_timeout_seconds and default_memory_limit_bytes
  • added per-sample timeout_seconds and memory_limit_bytes support in CodeEnvMetadata
  • moved code execution into a supervised child process instead of running user code inline inside the Ray actor
  • enforce timeouts in the child process with a parent-side watchdog and startup grace period
  • enforce memory limits via resource.RLIMIT_AS on Linux and return a clear error observation on unsupported platforms
  • preserve multi-turn execution context across subprocess boundaries
  • normalize syntax/runtime/timeout/memory failures into environment observations so the actor stays healthy after bad code
  • documented the new config and metadata contract in the environments guide
  • added focused unit coverage for limit resolution, syntax errors, timeouts, context persistence, and memory-limit recovery

Why

Issue #816 asks for timeout and RAM limits when training or evaluating coding tasks. Without those constraints, generated programs can rely on brute-force behavior, infinite loops, or pathological memory use that distorts the training signal and can destabilize the execution environment.

Design Notes

  • The implementation keeps the public surface area small: limits are supplied through the existing environment config and extra_env_info metadata path.
  • The Ray actor remains responsible for orchestration, while the child process provides isolation for unsafe execution and failure recovery.
  • Timeout enforcement is split between an in-process execution timer and a parent watchdog so the configured limit applies to user code, while subprocess startup does not spuriously trip very small limits.
  • Memory limiting is intentionally Linux-only because RLIMIT_AS is not reliable across platforms.

Validation

  • git diff --check
  • python3 -m py_compile nemo_rl/environments/code_environment.py tests/unit/environments/test_code_environment.py
  • Focused pytest run:
    RAY_ENABLE_UV_RUN_RUNTIME_ENV=0 PYTHONPATH=/tmp/codex-verification-stubs:/tmp/rl-issue-816-code-env /tmp/rl-issue-816-code-env/.venv-test/bin/python - <<'PY' ... pytest.main([... 'tests/unit/environments/test_code_environment.py', '-k', 'not vllm_execute_code']) ... PY
    Result: 7 passed, 1 skipped, 1 deselected in 44.17s

Notes

  • The skipped test is the Linux-only memory-limit test, which is expected on macOS.
  • The vLLM integration test was left deselected in local validation because this worktree validation environment does not include that optional stack.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Apr 3, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added documentation Improvements or additions to documentation community-request labels Apr 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-request documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Set timeout and RAM limit inside code environment

2 participants