DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs
Ko, Jongwoo, Chen, Tianyi, Kim, Sungnyun, Ding, Tianyu, Liang, Luming, Zharkov, Ilya, Yun, Se-Young
–arXiv.org Artificial Intelligence
Despite the success of distillation in large language models (LLMs), most prior work applies identical loss functions to both teacher- and student-generated data. These strategies overlook the synergy between loss formulations and data types, leading to a suboptimal performance boost in student models. To address this, we propose DistiLLM-2, a contrastive approach that simultaneously increases the likelihood of teacher responses and decreases that of student responses by harnessing this synergy. Our extensive experiments show that DistiLLM-2 not only builds high-performing student models across a wide range of tasks, including instruction-following and code generation, but also supports diverse applications, such as preference alignment and vision-language extensions. These findings highlight the potential of a contrastive approach to enhance the efficacy of LLM distillation by effectively aligning teacher and student models across varied data types.
arXiv.org Artificial Intelligence
Mar-10-2025
- Country:
- Asia > Middle East
- UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- North America > United States (0.28)
- Asia > Middle East
- Genre:
- Research Report
- New Finding (0.67)
- Promising Solution (0.46)
- Research Report
- Industry:
- Education (1.00)
- Technology: