KD-Zero: Evolving Knowledge Distiller for Any Teacher-Student Pairs

Jan-20-2025, 00:04:43 GMT–Neural Information Processing Systems

Knowledge distillation (KD) has emerged as an effective technique for compressing models that can enhance the lightweight model. Conventional KD methods propose various designs to allow student model to imitate the teacher better. However, these handcrafted KD designs heavily rely on expert knowledge and may be sub-optimal for various teacher-student pairs. In this paper, we present a novel framework, KD-Zero, which utilizes evolutionary search to automatically discover promising distiller from scratch for any teacher-student architectures. Then, we construct our distiller search space by selecting advanced operations for these three components.

distiller, evolving knowledge distiller, teacher-student pair, (2 more...)

Neural Information Processing Systems

Jan-20-2025, 00:04:43 GMT

Conferences Web Page

Add feedback

Industry:
- Education (0.62)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Search (0.66)
  - Cognitive Science > Problem Solving (0.66)
  - Machine Learning (0.42)