KD-Zero: Evolving Knowledge Distiller for Any Teacher-Student Pairs
–Neural Information Processing Systems
Knowledge distillation (KD) has emerged as an effective technique for compressing models that can enhance the lightweight model. Conventional KD methods propose various designs to allow student model to imitate the teacher better. However, these handcrafted KD designs heavily rely on expert knowledge and may be sub-optimal for various teacher-student pairs. In this paper, we present a novel framework, KD-Zero, which utilizes evolutionary search to automatically discover promising distiller from scratch for any teacher-student architectures. Then, we construct our distiller search space by selecting advanced operations for these three components.
Neural Information Processing Systems
Jan-20-2025, 00:04:43 GMT