TRL Fine Tuning
MLOps
TRL Fine Tuning
Fine-tune LLMs using reinforcement learning with TRL - SFT for instruction tuning, DPO for preference alignment, PPO/GRPO for reward optimization, and reward model training. Use when need RLHF, align model with preferences, or train from human feedback. Works with HuggingFace Transformers.
Enabled by defaultBuilt In
CLI install command
aegis skills install trl-fine-tuningBundled with the packaged Aegis CLI as a built-in procedural skill.
Already ships inside the packaged Aegis bundle. Use `aegis skills install trl-fine-tuning` only when you want an explicit local materialization record.