Self-Distillation Amplifies Regularization in Hilbert Space

Oct-2-2025, 11:11:23 GMT–Neural Information Processing Systems

Knowledge distillation introduced in the deep learning context is a method to transfer knowledge from one architecture to another. In particular, when the architectures are identical, this is called self-distillation. The idea is to feed in predictions of the trained model as new target values for retraining (and iterate this loop possibly a few times). It has been empirically observed that the self-distilled model often achieves higher accuracy on held out data.

artificial intelligence, distillation, machine learning, (17 more...)

Neural Information Processing Systems

Oct-2-2025, 11:11:23 GMT

Conferences PDF

Add feedback

Country:
- Europe
  - Denmark > Capital Region
    - Copenhagen (0.04)
  - Germany > Berlin (0.04)
  - Sweden > Stockholm
    - Stockholm (0.04)
- North America
  - Canada (0.04)
  - United States
    - California
      - Alameda County > Berkeley (0.04)
      - Los Angeles County > Long Beach (0.04)
      - Santa Clara County > Mountain View (0.04)
    - Hawaii > Honolulu County
      - Honolulu (0.04)

Industry:
- Education (0.46)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Duplicate Docs Excel Report

Title
2288f691b58edecadcc9a8691762b4fd-Paper.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found