Skip to main content

vLLM

MLOps

vLLM

Serves LLMs with high throughput using vLLM's PagedAttention and continuous batching. Use when deploying production LLM APIs, optimizing inference latency/throughput, or serving models with limited GPU memory. Supports OpenAI-compatible endpoints, quantization (GPTQ/AWQ/FP8), and tensor parallelism.

Enabled by defaultBuilt In
CLI install commandaegis skills install vllm
Overview

Bundled with the packaged Aegis CLI as a built-in procedural skill.

Already ships inside the packaged Aegis bundle. Use `aegis skills install vllm` only when you want an explicit local materialization record.