vLLM

MLOps

vLLM

Serves LLMs with high throughput using vLLM's PagedAttention and continuous batching. Use when deploying production LLM APIs, optimizing inference latency/throughput, or serving models with limited GPU memory. Supports OpenAI-compatible endpoints, quantization (GPTQ/AWQ/FP8), and tensor parallelism.

Enabled by defaultBuilt In

CLI install commandaegis skills install vllm

Install guide

Overview

Bundled with the packaged Aegis CLI as a built-in procedural skill.

Already ships inside the packaged Aegis bundle. Use `aegis skills install vllm` only when you want an explicit local materialization record.

vLLM

Also in MLOps