Self-Distillation Amplifies Regularization in Hilbert Space
–Neural Information Processing Systems
Knowledge distillation introduced in the deep learning context is a method to transfer knowledge from one architecture to another. In particular, when the architectures are identical, this is called self-distillation. The idea is to feed in predictions of the trained model as new target values for retraining (and iterate this loop possibly a few times). It has been empirically observed that the self-distilled model often achieves higher accuracy on held out data.
Neural Information Processing Systems
Oct-2-2025, 11:11:23 GMT
- Country:
- Europe
- Denmark > Capital Region
- Copenhagen (0.04)
- Germany > Berlin (0.04)
- Sweden > Stockholm
- Stockholm (0.04)
- Denmark > Capital Region
- North America
- Canada (0.04)
- United States
- California
- Alameda County > Berkeley (0.04)
- Los Angeles County > Long Beach (0.04)
- Santa Clara County > Mountain View (0.04)
- Hawaii > Honolulu County
- Honolulu (0.04)
- California
- Europe
- Industry:
- Education (0.46)
- Technology: