Intelligent load balancer for distributed vLLM server clusters 分布式 vLLM 服务器集群的智能负载均衡器
-
Updated
Oct 22, 2025 - Python
Intelligent load balancer for distributed vLLM server clusters 分布式 vLLM 服务器集群的智能负载均衡器
agentsculptor is an experimental AI-powered development agent designed to analyze, refactor, and extend Python projects automatically. It uses an OpenAI-like planner–executor loop on top of a vLLM backend, combining project context analysis, structured tool calls, and iterative refinement. It has only been tested with gpt-oss-120b via vLLM.
A curated list of plugins built on top of vLLM
Wheels & Docker images for running vLLM on CPU-only systems, optimized for different CPU instruction sets
The core source files to this self-hostable successor to the OpenAI Assistants API. To contribute to the core logic, fork or submit pull requests to this repro.
Performant LLM inferencing on Kubernetes via vLLM
This Repository contains terraform configuration for vllm production-stack in the cloud managed K8s
Deploy the Magistral-Small-2506 model using vLLM and Modal
Qwen 3.5 Reverse Proxy for handling instant / thinking modes and their variants automatically
[KAIST CS632] Road damage detection using YOLOv8 on Xilinx FPGA, repair estimation with vLLM-Serve Phi-3.5 FAISS RAG, and data management via GS1 EPCISv2 and React dashboard
This project offers a production-ready RAG (Retrieval-Augmented Generation) API running on FastAPI, utilizing the high-performance vLLM engine.
Load testing openai/gpt-oss-20b with vLLM and Docker
Deploy the SOTA qwen 3.5 to 9B Serverless
A reproducible benchmarking suite for vLLM inference. Measure latency, throughput, and VRAM across model configurations, quantization schemes, and deployment environments.
This is vllm multi tenant large language model gateway. This system is created to serve lot of requests at same time to lot of users. It uses vllm as it's engine to run llm, it has scheduler to schedule the queries of users and limiter to limit the use of specific user. It also uses LoRA adapters in vllm.
Đây là mô hình OCR được tinh chỉnh từ Vintern1B (InternVL 1B) với 1 tỷ tham số. Mô hình có khả năng nhận diện văn bản trong nhiều ngữ cảnh khác nhau như chữ viết tay, chữ in, và văn bản trên các đối tượng thực tế.
Distributed ML inference across a desktop RTX 3060 and a Raspberry Pi 4B, connected with Ray.
Project to set up a UI for users to interact with a LLM being served using vLLM
Add a description, image, and links to the vllm-serve topic page so that developers can more easily learn about it.
To associate your repository with the vllm-serve topic, visit your repo's landing page and select "manage topics."