KD-Zero: Evolving Knowledge Distiller for Any Teacher-Student Pairs
–Neural Information Processing Systems
Knowledge distillation (KD) has emerged as an effective technique for compressing models that can enhance the lightweight model. Conventional KD methods propose various designs to allow student model to imitate the teacher better. However, these handcrafted KD designs heavily rely on expert knowledge and may be sub-optimal for various teacher-student pairs. In this paper, we present a novel framework, KD-Zero, which utilizes evolutionary search to automatically discover promising distiller from scratch for any teacher-student architectures.
Neural Information Processing Systems
May-25-2025, 13:47:52 GMT
- Country:
- Asia (0.14)
- Genre:
- Research Report (0.93)
- Industry:
- Education (0.88)
- Technology:
- Information Technology > Artificial Intelligence
- Cognitive Science (0.91)
- Machine Learning
- Evolutionary Systems (0.88)
- Neural Networks > Deep Learning (0.46)
- Representation & Reasoning > Search (0.91)
- Vision (1.00)
- Information Technology > Artificial Intelligence