vllm-serving-planner
Skillby mouadja02
Use when planning, reviewing, or tuning vLLM or OpenAI-compatible LLM serving for throughput, latency, KV-cache pressure, batching, quantization, prefix caching, or multimodal serving.
Details
- Path
- skills/llm-tooling/vllm-serving-planner
- Bundled scripts
- 1
- Dependencies
- 1
Bundled scripts
- skills/llm-tooling/vllm-serving-planner/scripts/vllm_capacity_planner.py