vLLM + Qwen3.5-122B-A10B-NVFP4 on NVIDIA DGX Spark (GB10/SM121) — single-GPU NVFP4 W4A4 with MTP speculative decoding, self-contained Docker build
-
Updated
Mar 12, 2026 - Python
vLLM + Qwen3.5-122B-A10B-NVFP4 on NVIDIA DGX Spark (GB10/SM121) — single-GPU NVFP4 W4A4 with MTP speculative decoding, self-contained Docker build
Wheels & Docker images for running vLLM on CPU-only systems, optimized for different CPU instruction sets
Multi-agent system for automated mobile QA testing using LLMs, ADB, and vision-grounded execution with Simular Agent S3
Run an AI vision model on Google Colab and use it from your local computer through a public API.
Add a description, image, and links to the vllm-server topic page so that developers can more easily learn about it.
To associate your repository with the vllm-server topic, visit your repo's landing page and select "manage topics."