Skip to content

fix(orchestrator): prefer running sandbox on host IP lookups#2387

Draft
dobrac wants to merge 1 commit intomainfrom
codex/address-vulnerability-with-stopping-sandboxes
Draft

fix(orchestrator): prefer running sandbox on host IP lookups#2387
dobrac wants to merge 1 commit intomainfrom
codex/address-vulnerability-with-stopping-sandboxes

Conversation

@dobrac
Copy link
Copy Markdown
Contributor

@dobrac dobrac commented Apr 14, 2026

Motivation

  • Prevent a race where GetByHostPort can return a stopping (stale) sandbox while its IP slot has been returned and reused, which can lead to cross-tenant volume access or misattributed traffic.
  • The root cause is that the map retained stopping sandboxes for IP-based lookups and GetByHostPort returned the first IP match without preferring running entries.

Description

  • Update Map.GetByHostPort to prefer StatusRunning sandboxes and only fall back to a non-running sandbox when no running match exists.
  • Keep the fallback behavior so existing stopping-phase lookups still work when no running sandbox is present.
  • Add unit tests in packages/orchestrator/pkg/sandbox/map_test.go that verify GetByHostPort prefers a running sandbox over a stopping one and still returns a stopping sandbox when it is the only match.

Testing

  • Added unit tests TestGetByHostPortPrefersRunningSandbox and TestGetByHostPortFallsBackToStoppingSandbox in packages/orchestrator/pkg/sandbox/map_test.go to validate the lookup behavior.
  • Attempted to run go test ./packages/orchestrator/pkg/sandbox -run TestGetByHostPort -count=1 in the current environment but execution failed because the go binary is not installed (/bin/bash: line 1: go: command not found); these tests are expected to run in CI or a local Go toolchain and should pass there.

Codex Task


Open with Devin

@cursor
Copy link
Copy Markdown

cursor bot commented Apr 14, 2026

PR Summary

High Risk
Changes IP-based sandbox resolution logic used for routing/volume access, so any regression could misattribute traffic or reintroduce cross-tenant access. Behavior now depends on sandbox status ordering, which may surface edge cases if multiple non-running sandboxes share an IP.

Overview
Updates Map.GetByHostPort to return a StatusRunning sandbox when multiple sandboxes share the same host IP, only falling back to a non-running match if no running sandbox exists, and adds unit tests to cover both the preferred-running and fallback-only scenarios.

Reviewed by Cursor Bugbot for commit ac071b1. Bugbot is set up for automated code reviews on this repo. Configure here.

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit ac071b1. Configure here.


if sbx != running {
t.Fatalf("expected running sandbox, got %#v", sbx)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests use manual t.Fatalf instead of testify require

Low Severity

The new tests use manual if err != nil { t.Fatalf(...) } and if sbx != running { t.Fatalf(...) } patterns instead of require.NoError(t, err) and require.Same(t, expected, actual) from github.com/stretchr/testify/require. The sibling test file block/cache_test.go in the same package already uses require consistently.

Additional Locations (1)
Fix in Cursor Fix in Web

Triggered by learned rule: Use testify require/assert instead of manual t.Errorf in Go tests

Reviewed by Cursor Bugbot for commit ac071b1. Configure here.

return sbx, nil
}

if fallback == nil {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description says the IP slot can be freed while the stopping sandbox is still in the map, allowing a new sandbox to be assigned that same IP while in StatusStarting. In that window there are two non-running entries for the same IP — one StatusStopping, one StatusStarting. The current fallback stores the first one found in map iteration order (non-deterministic), so the NFS proxy, TCP firewall, and log handlers can still misattribute traffic from the old VM to the new sandbox.

Preferring StatusStopping over StatusStarting in the fallback would be deterministic and correct, since the stopping VM is the one actually making those connections. Consider updating the fallback condition to: if fallback == nil || the current candidate's status is StatusStopping, then assign it as fallback.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 2 additional findings.

Open in Devin Review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants