Skip to content

Support CPU template configuration for mixed-CPU pools #2338

@feifeigood

Description

@feifeigood

Problem

When self-hosting E2B on a cluster with mixed CPU generations (e.g., Intel Xeon Silver 4216 Cascade Lake + 4314 Ice Lake), guest VMs crash with SIGILL (Illegal Instruction, exit code 132).

Root cause: The orchestrator does not set CpuTemplate in Firecracker's MachineConfiguration (defaults to None). This causes Firecracker to pass through the host's full CPUID to the guest. When a template is built on an Ice Lake node (which exposes avx512_vbmi2, gfni, vaes, etc.), and the resulting sandbox is later scheduled on a Cascade Lake node (which lacks these instruction sets), any binary that detects and uses these instructions (e.g., Node.js v22) will crash with SIGILL.

dmesg on host:

traps: node[2562] trap invalid opcode ip:257c2dd sp:7fffbf87f9c0 error:0 in node[e33000+2489000]

Current Code

packages/orchestrator/internal/sandbox/fc/client.gosetMachineConfiguration:

machineConfig := &models.MachineConfiguration{
    VcpuCount:       &vCPUCount,
    MemSizeMib:      &memoryMB,
    Smt:             &smt,
    TrackDirtyPages: &trackDirtyPages,
    // CPUTemplate is not set — defaults to None
}

The MachineConfiguration struct already has CPUTemplate *CPUTemplate field, and CPUTemplate enum includes T2CL (Cascade Lake baseline). It's just not being used.

Proposed Solution

Option 1: Configurable built-in CPU template (minimal change)

Add an environment variable FC_CPU_TEMPLATE (default None for backward compatibility) that sets the built-in Firecracker CPU template:

cpuTemplateStr := os.Getenv("FC_CPU_TEMPLATE") // e.g., "T2CL"
if cpuTemplateStr != "" && cpuTemplateStr != "None" {
    cpuTemplate := models.CPUTemplate(cpuTemplateStr)
    machineConfig.CPUTemplate = &cpuTemplate
}

Self-hosters with mixed CPU clusters would set FC_CPU_TEMPLATE=T2CL to normalize guest CPUID to the Cascade Lake baseline.

Option 2: Custom CPU template support (advanced)

Firecracker supports custom CPU templates via JSON that allow fine-grained control over CPUID leaves and MSR modifiers. This would allow self-hosters to define the exact intersection of their fleet's CPU capabilities.

Add an optional FC_CUSTOM_CPU_TEMPLATE_PATH env var pointing to a JSON file that gets passed to Firecracker's --cpu-template CLI flag or the PUT /machine-config API.

Recommendation

Option 1 is sufficient for most self-hosted scenarios and requires minimal code change. Option 2 could be added later for advanced use cases.

Environment

  • E2B Infra: self-hosted (custom-infra branch)
  • Firecracker: v1.5.0 / v1.7.0-dev / v1.10.1 / v1.12.1 (mixed versions across nodes)
  • Host CPUs: Intel Xeon Silver 4216 (Cascade Lake) + 4314 (Ice Lake)
  • Guest Node.js: v22.22.2
  • OS: Ubuntu 22.04 / 24.04

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions