KD-Zero: Evolving Knowledge Distiller for Any Teacher-Student Pairs

May-25-2025, 13:47:52 GMT–Neural Information Processing Systems

Knowledge distillation (KD) has emerged as an effective technique for compressing models that can enhance the lightweight model. Conventional KD methods propose various designs to allow student model to imitate the teacher better. However, these handcrafted KD designs heavily rely on expert knowledge and may be sub-optimal for various teacher-student pairs. In this paper, we present a novel framework, KD-Zero, which utilizes evolutionary search to automatically discover promising distiller from scratch for any teacher-student architectures.

distillation, evolutionary algorithm, machine learning, (18 more...)

Neural Information Processing Systems

May-25-2025, 13:47:52 GMT

Conferences PDF

Add feedback

Country:
- Asia (0.14)

Genre:
- Research Report (0.93)

Industry:
- Education (0.88)

Technology:
- Information Technology > Artificial Intelligence
  - Cognitive Science (0.91)
  - Machine Learning
    - Evolutionary Systems (0.88)
    - Neural Networks > Deep Learning (0.46)
  - Representation & Reasoning > Search (0.91)
  - Vision (1.00)