vLLM
MLOps
vLLM
Serves LLMs with high throughput using vLLM's PagedAttention and continuous batching. Use when deploying production LLM APIs, optimizing inference latency/throughput, or serving models with limited GPU memory. Supports OpenAI-compatible endpoints, quantization (GPTQ/AWQ/FP8), and tensor parallelism.
Enabled by defaultBuilt In
CLI install command
aegis skills install vllmBundled with the packaged Aegis CLI as a built-in procedural skill.
Already ships inside the packaged Aegis bundle. Use `aegis skills install vllm` only when you want an explicit local materialization record.