Meta Learning for Knowledge Distillation

Zhou, Wangchunshu, Xu, Canwen, McAuley, Julian

Jun-8-2021–arXiv.org Artificial Intelligence

We present Meta Learning for Knowledge Distillation (MetaDistil), a simple yet effective alternative to traditional knowledge distillation (KD) methods where the teacher model is fixed during training. We show the teacher network can learn to better transfer knowledge to the student network (i.e., learning to teach) with the feedback from the performance of the distilled student network in a meta learning framework. Moreover, we introduce a pilot update mechanism to improve the alignment between the inner-learner and meta-learner in meta learning algorithms that focus on an improved inner-learner. Experiments on various benchmarks show that MetaDistil can yield significant improvements compared with traditional KD algorithms and is less sensitive to the choice of different student capacity and hyperparameters, facilitating the use of KD on different tasks and models. With the prevalence of large neural networks with millions or billions of parameters, model compression is gaining prominence for facilitating efficient, eco-friendly deployment for machine learning applications. Previous works often train a large model as the "teacher"; then they fix the teacher and train a "student" model to mimic the behavior of the teacher, in order to transfer the knowledge from the teacher to the student. However, this paradigm has the following drawbacks: (1) The teacher is unaware of the student. Recent studies in pedagogy suggest student-centered learning, which considers students' characteristics and learning capability, has shown effectiveness improving students' performance (Cornelius-White, 2007; Wright, 2011).

distillation, metadistil, student, (12 more...)

arXiv.org Artificial Intelligence

Jun-8-2021

arXiv.org PDF

Add feedback

Country:
- North America > United States > California
  - Santa Clara County > Palo Alto (0.04)
  - San Diego County > San Diego (0.04)

Genre:
- Research Report (0.84)

Industry:
- Education
  - Educational Setting (0.88)
  - Educational Technology > Educational Software (0.57)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Optimization (0.46)
  - Machine Learning
    - Statistical Learning (0.68)
    - Neural Networks (0.48)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found