llama.cpp

MLOps

llama.cpp

Run LLM inference with llama.cpp on CPU, Apple Silicon, AMD/Intel GPUs, or NVIDIA — plus GGUF model conversion and quantization (2–8 bit with K-quants and imatrix). Covers CLI, Python bindings, OpenAI-compatible server, and Ollama/LM Studio integration. Use for edge deployment, M1/M2/M3/M4 Macs, CUDA-less environments, or flexible local quantization.

Enabled by defaultBuilt In

CLI install commandaegis skills install llama-cpp

Install guide

Overview

Bundled with the packaged Aegis CLI as a built-in procedural skill.

Already ships inside the packaged Aegis bundle. Use `aegis skills install llama-cpp` only when you want an explicit local materialization record.

llama.cpp

Also in MLOps