LLM Inference MCP Server
Multi-model LLM routing with cost optimization, structured output, and batch inference.
Features
- Smart model routing (8 task profiles)
- Multi-provider support (OpenAI, DeepSeek, Anthropic, vLLM)
- Structured JSON output from any model
- Batch inference (up to 50 parallel prompts)
- Token counting and cost estimation
- Side-by-side model comparison
Tools (7)
list_models, chat_completion, structured_output, batch_inference, count_tokens, estimate_inference_cost, compare_models
GitHub: https://github.com/zhaohongyuziranerran/llm-inference-mcp
LLM Inference MCP Server
Multi-model LLM routing with cost optimization, structured output, and batch inference.
Features
Tools (7)
list_models, chat_completion, structured_output, batch_inference, count_tokens, estimate_inference_cost, compare_models
GitHub: https://github.com/zhaohongyuziranerran/llm-inference-mcp