Skip to content

vllm-serving-planner

Use when planning, reviewing, or tuning vLLM or OpenAI-compatible LLM serving for throughput, latency, KV-cache pressure, batching, quantization, prefix caching, or multimodal serving.

Repository Source folder

Details

Path
skills/llm-tooling/vllm-serving-planner
Bundled scripts
1
Dependencies
1

Bundled scripts

  • skills/llm-tooling/vllm-serving-planner/scripts/vllm_capacity_planner.py

FAQ