Skip to main content

Evaluation

MLOps

Evaluation

Frames model, prompt, and system evaluation as a reproducible experiment with baselines, datasets, and explicit metrics.

On demandAvailable when invokedBuilt In
Install from Aegisaegis skills install evaluation
Overview

Bundled with the packaged Aegis CLI as a built-in procedural skill.

Already ships inside the packaged Aegis bundle. Use `aegis skills install evaluation` only when you want an explicit local materialization record.

Aliases

evalevaluationbenchmark models

Trigger phrases

evaluate this modelbenchmark these promptsset up an eval harness

Keywords

evaluationbenchmarkdatasetmetricbaselinereproducibility