LocalAIStack is an infrastructure-oriented software stack designed to manage local AI workstations as long-lived, reproducible computing environments.
This document describes:
- The architectural layers of LocalAIStack
- Core responsibilities of each layer
- Key design decisions and trade-offs
- Extension and evolution principles
It does not describe installation steps or end-user workflows.
LocalAIStack follows these core architectural principles:
All core functionality must work without external network access.
External services (e.g. model registries, translation APIs) are optional and replaceable.
Software capabilities are constrained by detected hardware.
LocalAIStack never assumes uniform hardware and never exposes functionality that cannot be reliably supported by the current machine.
Each layer has a clearly defined responsibility:
- Control logic does not execute workloads
- Runtime does not decide policy
- Modules do not manage global state
Given the same hardware and configuration:
- Installation results must be deterministic
- Runtime behavior must be reproducible
- Version drift must be observable and reversible
LocalAIStack does not encode assumptions about:
- GPU vendors
- Model providers
- Framework ecosystems
- Cloud platforms
┌──────────────────────────────┐
│ Interfaces │
│ Web UI / CLI / API │
└──────────────▲───────────────┘
│
┌──────────────┴───────────────┐
│ Control Layer │
│ Policy, State, Resolution │
└──────────────▲───────────────┘
│
┌──────────────┴───────────────┐
│ Runtime Layer │
│ Containers / Native Exec │
└──────────────▲───────────────┘
│
┌──────────────┴───────────────┐
│ Software Modules │
│ Languages / AI / Services │
└──────────────────────────────┘
- User interaction
- Status visualization
- Operation triggering
- Web-based UI
- Command-line interface (CLI)
- Internal API (REST or gRPC)
- No direct package installation
- No hardware probing
- No policy decisions
Interfaces only issue intent, never perform actions directly.
The Control Layer is the core of LocalAIStack.
- Hardware detection and classification
- Capability policy evaluation
- Software resolution and dependency planning
- State tracking and reconciliation
- Upgrade and rollback orchestration
Detects and normalizes hardware attributes:
- CPU cores and topology
- System memory
- GPU model, memory, and interconnects
- Storage characteristics
Outputs a hardware profile consumed by the policy engine.
Maps hardware profiles to allowed capabilities.
Example policies:
- Maximum supported model size
- Allowed inference runtimes
- Parallelism constraints
- Memory and VRAM thresholds
Policies are declarative and versioned.
Determines:
- Which software modules are installable
- Compatible versions and combinations
- Required runtime backends
- Conflicting dependencies
The resolver produces an execution plan, not actions.
Maintains system state:
- Installed modules
- Versions and hashes
- Runtime status
- Configuration overrides
State is persistent and auditable.
The Runtime Layer is responsible for executing software, not deciding what should exist.
- Container-based (default)
- Native execution (performance-critical paths)
- Process lifecycle management
- Resource isolation
- Log collection
- Health reporting
- No dependency resolution
- No policy enforcement
- No user-facing decisions
Software modules represent installable units.
- Programming language environments
- AI inference engines
- AI frameworks
- Infrastructure services
- AI applications
- Developer tools
Each module is described by a manifest containing:
- Metadata
- Hardware requirements
- Dependencies
- Runtime constraints
- Exposed interfaces
- Optional integrity metadata (checksum/signature)
Modules are self-describing and independently versioned.
The module registry loads manifests, validates schema/integrity, and resolves dependency graphs to produce install plans with explicit version selection.
Model management is treated as a first-class concern.
- Model metadata tracking
- Storage layout management
- Integrity verification
- Hardware compatibility checks
- No automatic model execution
- No preference for specific providers
Models are resources, not services.
LocalAIStack classifies machines into capability tiers.
Example:
- Tier 1: Entry-level inference
- Tier 2: Mid-range local LLM workloads
- Tier 3: Multi-GPU and large-model systems
Tier definitions are policy-driven, not hardcoded.
- All UI text is key-based
- Language resolution occurs at the interface layer
- AI-assisted translation is optional and cacheable
- No runtime dependency on external translation services
- Translations must not affect system behavior
LocalAIStack is designed to be extended without modifying core logic.
- New software modules
- Additional runtime backends
- Alternative interfaces
- Custom policy sets
Extensions are loaded via manifests and registered with the Control Layer.
LocalAIStack treats failure as a first-class condition.
- Atomic operations
- Explicit error states
- Partial installation detection
- Version pinning and rollback
Silent failure is considered a bug.
LocalAIStack explicitly does not aim to be:
- A cloud orchestration system
- A training cluster manager
- A hosted SaaS platform
- A proprietary appliance OS
LocalAIStack is expected to evolve in phases:
- Stable mid-range local inference workflows
- Broader application ecosystem support
- Multi-node and collaborative scenarios (optional)
Backward compatibility and migration paths are mandatory concerns.
LocalAIStack is designed as infrastructure, not as an application bundle.
Its architecture prioritizes:
- Predictability over convenience
- Explicit policy over implicit behavior
- Long-term maintainability over short-term optimization