Towards the Fundamental Limits of Knowledge Transfer over Finite Domains

Nov-14-2023–arXiv.org Machine Learning

It has become common sense that transferring intrinsic information from teachers to the greatest extent can expedite a student's learning progress, especially in machine learning given versatile and powerful teacher models. Learning with their assistance has been coined knowledge distillation (KD) (Hinton et al., 2015; Lopez-Paz et al., 2015), a famous paradigm of knowledge transfer leading to remarkable empirical effectiveness in classification tasks across various downstream applications (Gou et al., 2021; Wang and Yoon, 2021; Gu et al., 2023b). The term distillation implies a belief that the inscrutable teacher(s) may possess useful yet complicated structural information, which we should be able to compress and inject into a compact one, i.e., the student model (Breiman and Shang, 1996; Buciluǎ et al., 2006; Li et al., 2014; Ba and Caruana, 2014; Allen-Zhu and Li, 2020). This has guided the community towards a line of knowledge transfer methods featuring the awareness of teacher training details or snapshots, such as the original training set, the intermediate activations, the last-layer logits (for a probabilistic classifier), the first-or second-order derivative or statistical information, and even task-specific knowledge (Hinton et al., 2015; Furlanello et al., 2018; Cho and Hariharan, 2019; Zhao et al., 2022; Romero et al., 2014; Zagoruyko and Komodakis, 2016;

arxiv preprint arxiv, large language model, machine learning, (21 more...)

arXiv.org Machine Learning

Nov-14-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - New York (0.04)
  - District of Columbia > Washington (0.04)
  - California > Alameda County
    - Berkeley (0.14)
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report (0.50)

Industry:
- Education (0.86)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Uncertainty (1.00)
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (0.68)
  - Machine Learning
    - Neural Networks > Deep Learning (1.00)
    - Statistical Learning (0.92)
    - Learning Graphical Models > Directed Networks
      - Bayesian Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found