Towards the Fundamental Limits of Knowledge Transfer over Finite Domains
It has become common sense that transferring intrinsic information from teachers to the greatest extent can expedite a student's learning progress, especially in machine learning given versatile and powerful teacher models. Learning with their assistance has been coined knowledge distillation (KD) (Hinton et al., 2015; Lopez-Paz et al., 2015), a famous paradigm of knowledge transfer leading to remarkable empirical effectiveness in classification tasks across various downstream applications (Gou et al., 2021; Wang and Yoon, 2021; Gu et al., 2023b). The term distillation implies a belief that the inscrutable teacher(s) may possess useful yet complicated structural information, which we should be able to compress and inject into a compact one, i.e., the student model (Breiman and Shang, 1996; Buciluǎ et al., 2006; Li et al., 2014; Ba and Caruana, 2014; Allen-Zhu and Li, 2020). This has guided the community towards a line of knowledge transfer methods featuring the awareness of teacher training details or snapshots, such as the original training set, the intermediate activations, the last-layer logits (for a probabilistic classifier), the first-or second-order derivative or statistical information, and even task-specific knowledge (Hinton et al., 2015; Furlanello et al., 2018; Cho and Hariharan, 2019; Zhao et al., 2022; Romero et al., 2014; Zagoruyko and Komodakis, 2016;
Nov-14-2023
- Country:
- Asia > Middle East
- Jordan (0.04)
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- North America > United States
- California > Alameda County
- Berkeley (0.14)
- District of Columbia > Washington (0.04)
- New York (0.04)
- California > Alameda County
- Asia > Middle East
- Genre:
- Research Report (0.50)
- Industry:
- Education (0.86)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning
- Learning Graphical Models > Directed Networks
- Bayesian Learning (0.46)
- Neural Networks > Deep Learning (1.00)
- Statistical Learning (0.92)
- Learning Graphical Models > Directed Networks
- Natural Language
- Chatbot (0.68)
- Large Language Model (1.00)
- Representation & Reasoning > Uncertainty (1.00)
- Machine Learning
- Information Technology > Artificial Intelligence