DistiLLM: Towards Streamlined Distillation for Large Language Models
Ko, Jongwoo, Kim, Sungnyun, Chen, Tianyi, Yun, Se-Young
–arXiv.org Artificial Intelligence
Knowledge distillation (KD) is widely used for compressing a teacher model to a smaller student model, reducing its inference cost and memory footprint while preserving model capabilities. However, current KD methods for auto-regressive sequence models (e.g., large language models) suffer from missing a standardized objective function. Moreover, the recent use of student-generated outputs to address training-inference mismatches has significantly escalated computational costs. To tackle these issues, we introduce DistiLLM, a more effective and efficient KD framework for auto-regressive language models. DistiLLM comprises two components: (1) a novel skew Kullback-Leibler divergence loss, where we unveil and leverage its theoretical properties, and (2) an adaptive off-policy approach designed to enhance the efficiency in utilizing student-generated outputs. Extensive experiments, including instruction-following tasks, demonstrate the effectiveness of DistiLLM in building high-performing student models while achieving up to 4.3$\times$ speedup compared to recent KD methods.
arXiv.org Artificial Intelligence
Feb-6-2024
- Country:
- Asia
- China > Hong Kong (0.04)
- Japan > Honshū
- Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.04)
- Europe
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- United Kingdom
- England > Leicestershire
- Leicester (0.04)
- Northern Ireland (0.04)
- Scotland (0.04)
- Wales (0.04)
- England > Leicestershire
- Ireland > Leinster
- North America
- Canada
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- Ontario > Toronto (0.04)
- British Columbia > Metro Vancouver Regional District
- United States
- Massachusetts > Hampden County
- Chicopee (0.04)
- Springfield (0.04)
- West Springfield (0.04)
- New Hampshire (0.04)
- Pennsylvania (0.04)
- Texas > Travis County
- Austin (0.04)
- Massachusetts > Hampden County
- Canada
- South America > Colombia
- Meta Department > Villavicencio (0.04)
- Asia
- Genre:
- Research Report > New Finding (0.67)
- Industry:
- Education (0.70)
- Technology: