#

vllm-serve

Here are 35 public repositories matching this topic...

xerrors / mvllm

Intelligent load balancer for distributed vLLM server clusters 分布式 vLLM 服务器集群的智能负载均衡器

inference balancer llms vllm vllm-serve

Updated Oct 22, 2025
Python

Perpetue237 / agentsculptor

agentsculptor is an experimental AI-powered development agent designed to analyze, refactor, and extend Python projects automatically. It uses an OpenAI-like planner–executor loop on top of a vLLM backend, combining project context analysis, structured tool calls, and iterative refinement. It has only been tested with gpt-oss-120b via vLLM.

nlp open-source ai hackathon-project coding-assistant llms vllm agentic-ai vllm-serve gpt-oss gpt-oss-120b gpt-oss-20b vllm-server-config

Updated Sep 17, 2025
Python

KempnerInstitute / distributed-inference-vllm

Distributed Inference with vLLM

hpc slurm vllm llama3 qwen2-5 vllm-serve

Updated Mar 26, 2026
Shell

BudEcosystem / Awesome-vLLM-plugins

A curated list of plugins built on top of vLLM

plugins vllm vllm-operator vllm-serve vllm-integration vllm-plugins

Updated Dec 12, 2025

MekayelAnik / vllm-cpu

Wheels & Docker images for running vLLM on CPU-only systems, optimized for different CPU instruction sets

cpu-inference vllm llm-inference vllm-serve vllm-server

Updated Apr 19, 2026
Shell

project-david-ai / projectdavid-core

The core source files to this self-hostable successor to the OpenAI Assistants API. To contribute to the core logic, fork or submit pull requests to this repro.

python docker self-hosted orchestration multi-agent firejail gdpr ai-platform llm vllm assistants-api rag-pipeline tool-calling vllm-serve openai-compatible

Updated Apr 19, 2026
Python

hadi-technology / vllm-mlops

Performant LLM inferencing on Kubernetes via vLLM

kubernetes digitalocean machine-learning mlops vllm vllm-serve

Updated Feb 11, 2025

brokedba / vllm-lab

This Repository contains terraform configuration for vllm production-stack in the cloud managed K8s

gke aks civo eks oke vllm llmcache vllm-operator vllm-serve vllm-production-stack

Updated Nov 10, 2025
HCL

kingabzpro / Deploying-the-Magistral-with-Modal

Deploy the Magistral-Small-2506 model using vLLM and Modal

modal mistral openai-api vllm-serve

Updated Jun 16, 2025
Python

iguanesolutions / qwen35-rp

Qwen 3.5 Reverse Proxy for handling instant / thinking modes and their variants automatically

inference reverse-proxy instant thinking openai-api llm vllm genai vllm-serve qwen3-5 sampling-parameters

Updated Apr 8, 2026
Go

SeungjaeLim / Efficient-Road-Repairs-System

[KAIST CS632] Road damage detection using YOLOv8 on Xilinx FPGA, repair estimation with vLLM-Serve Phi-3.5 FAISS RAG, and data management via GS1 EPCISv2 and React dashboard

react gs1 xilinx-fpga epcis faiss lmm rag yolov8 microsoft-phi3 vllm-serve

Updated Dec 19, 2024
Python

AbdulSametTurkmenoglu / vllm_rag_api

This project offers a production-ready RAG (Retrieval-Augmented Generation) API running on FastAPI, utilizing the high-performance vLLM engine.

rag llm vllm rag-chatbot vllm-serve

Updated Oct 31, 2025
Python

Aquiles-ai / load-test-vllm-gpt-oss-20b

Load testing openai/gpt-oss-20b with vLLM and Docker

docker load-testing vllm-serve gpt-oss-20b

Updated Sep 8, 2025
Python

mahimairaja / modal-qwen-3.5-9B

Deploy the SOTA qwen 3.5 to 9B Serverless

modal llm vllm llm-inference vllm-serve qwen3-5 qwen9b

Updated Mar 4, 2026
Python

rohitkt10 / vllm-bench

A reproducible benchmarking suite for vLLM inference. Measure latency, throughput, and VRAM across model configurations, quantization schemes, and deployment environments.

modal inference quantization llm vllm vllm-serve

Updated Jan 25, 2026
Python

Rohit2sali / vllm-multi-tenant-llm-gateway

This is vllm multi tenant large language model gateway. This system is created to serve lot of requests at same time to lot of users. It uses vllm as it's engine to run llm, it has scheduler to schedule the queries of users and limiter to limit the use of specific user. It also uses LoRA adapters in vllm.

machine-learning deep-learning lora multitenant inference-engine llm vllm vllm-serve

Updated Mar 5, 2026
Jupyter Notebook

HTAnh2003 / ViOCR-VLM-1B

Đây là mô hình OCR được tinh chỉnh từ Vintern1B (InternVL 1B) với 1 tỷ tham số. Mô hình có khả năng nhận diện văn bản trong nhiều ngữ cảnh khác nhau như chữ viết tay, chữ in, và văn bản trên các đối tượng thực tế.

docker llm vllm-serve

Updated Jun 9, 2025
HTML

dongxuecheng / ReportLLM

输入一些需求标准，调用大模型输出一段格式化的报告的框架

report fastapi vllm-serve

Updated Jan 19, 2026
Python

jaedmunt / big-node-little-node

Distributed ML inference across a desktop RTX 3060 and a Raspberry Pi 4B, connected with Ray.

raspberry-pi machine-learning self-hosted diy ray inference-server raspberry-pi-4 llm llamacpp llm-inference vllm-serve ai-sovereignty

Updated Mar 31, 2026
Python

sh4shv4t / vllmproject

Project to set up a UI for users to interact with a LLM being served using vLLM

flask vllm vllm-serve

Updated Sep 9, 2025
Python

Improve this page

Add a description, image, and links to the vllm-serve topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the vllm-serve topic, visit your repo's landing page and select "manage topics."