vllm-server

Here are 4 public repositories matching this topic...

vLLM + Qwen3.5-122B-A10B-NVFP4 on NVIDIA DGX Spark (GB10/SM121) — single-GPU NVFP4 W4A4 with MTP speculative decoding, self-contained Docker build

docker-compose dgx-spark vllm-server

Wheels & Docker images for running vLLM on CPU-only systems, optimized for different CPU instruction sets

cpu-inference vllm llm-inference vllm-serve vllm-server

Multi-agent system for automated mobile QA testing using LLMs, ADB, and vision-grounded execution with Simular Agent S3

Run an AI vision model on Google Colab and use it from your local computer through a public API.

Add a description, image, and links to the vllm-server topic page so that developers can more easily learn about it.

To associate your repository with the vllm-server topic, visit your repo's landing page and select "manage topics."