Born Again Neural Networks

Furlanello, Tommaso, Lipton, Zachary C., Tschannen, Michael, Itti, Laurent, Anandkumar, Anima

May-12-2018–arXiv.org Artificial Intelligence

Knowledge distillation (KD) consists of transferring knowledge from one machine learning model (the teacher}) to another (the student). Commonly, the teacher is a high-capacity model with formidable performance, while the student is more compact. By transferring knowledge, one hopes to benefit from the student's compactness. %we desire a compact model with performance close to the teacher's. We study KD from a new perspective: rather than compressing models, we train students parameterized identically to their teachers. Surprisingly, these {Born-Again Networks (BANs), outperform their teachers significantly, both on computer vision and language modeling tasks. Our experiments with BANs based on DenseNets demonstrate state-of-the-art performance on the CIFAR-10 (3.5%) and CIFAR-100 (15.5%) datasets, by validation error. Additional experiments explore two distillation objectives: (i) Confidence-Weighted by Teacher Max (CWTM) and (ii) Dark Knowledge with Permuted Predictions (DKPP). Both methods elucidate the essential components of KD, demonstrating a role of the teacher outputs on both predicted and non-predicted classes. We present experiments with students of various capacities, focusing on the under-explored case where students overpower teachers. Our experiments show significant advantages from transferring knowledge between DenseNets and ResNets in either direction.

arxiv preprint arxiv, knowledge distillation, student, (12 more...)

arXiv.org Artificial Intelligence

May-12-2018

arXiv.org PDF

Add feedback

Country:
- Oceania > Fiji (0.04)
- North America > United States
  - California
    - Santa Clara County > Palo Alto (0.04)
    - Los Angeles County
      - Los Angeles (0.28)
      - Pasadena (0.04)
- Europe
  - Switzerland (0.04)
  - Sweden > Stockholm
    - Stockholm (0.04)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Education (0.69)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks > Deep Learning (1.00)
  - Statistical Learning (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found