Advantage-Guided Distillation for Preference Alignment in Small Language Models

Open in new window