fix(orchestrator): prefer running sandbox on host IP lookups#2387
fix(orchestrator): prefer running sandbox on host IP lookups#2387
Conversation
PR SummaryHigh Risk Overview Reviewed by Cursor Bugbot for commit ac071b1. Bugbot is set up for automated code reviews on this repo. Configure here. |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit ac071b1. Configure here.
|
|
||
| if sbx != running { | ||
| t.Fatalf("expected running sandbox, got %#v", sbx) | ||
| } |
There was a problem hiding this comment.
Tests use manual t.Fatalf instead of testify require
Low Severity
The new tests use manual if err != nil { t.Fatalf(...) } and if sbx != running { t.Fatalf(...) } patterns instead of require.NoError(t, err) and require.Same(t, expected, actual) from github.com/stretchr/testify/require. The sibling test file block/cache_test.go in the same package already uses require consistently.
Additional Locations (1)
Triggered by learned rule: Use testify require/assert instead of manual t.Errorf in Go tests
Reviewed by Cursor Bugbot for commit ac071b1. Configure here.
| return sbx, nil | ||
| } | ||
|
|
||
| if fallback == nil { |
There was a problem hiding this comment.
The PR description says the IP slot can be freed while the stopping sandbox is still in the map, allowing a new sandbox to be assigned that same IP while in StatusStarting. In that window there are two non-running entries for the same IP — one StatusStopping, one StatusStarting. The current fallback stores the first one found in map iteration order (non-deterministic), so the NFS proxy, TCP firewall, and log handlers can still misattribute traffic from the old VM to the new sandbox.
Preferring StatusStopping over StatusStarting in the fallback would be deterministic and correct, since the stopping VM is the one actually making those connections. Consider updating the fallback condition to: if fallback == nil || the current candidate's status is StatusStopping, then assign it as fallback.


Motivation
GetByHostPortcan return a stopping (stale) sandbox while its IP slot has been returned and reused, which can lead to cross-tenant volume access or misattributed traffic.GetByHostPortreturned the first IP match without preferring running entries.Description
Map.GetByHostPortto preferStatusRunningsandboxes and only fall back to a non-running sandbox when no running match exists.packages/orchestrator/pkg/sandbox/map_test.gothat verifyGetByHostPortprefers a running sandbox over a stopping one and still returns a stopping sandbox when it is the only match.Testing
TestGetByHostPortPrefersRunningSandboxandTestGetByHostPortFallsBackToStoppingSandboxinpackages/orchestrator/pkg/sandbox/map_test.goto validate the lookup behavior.go test ./packages/orchestrator/pkg/sandbox -run TestGetByHostPort -count=1in the current environment but execution failed because thegobinary is not installed (/bin/bash: line 1: go: command not found); these tests are expected to run in CI or a local Go toolchain and should pass there.Codex Task