Asymmetric Temperature Scaling Makes Larger Networks Teach Well Again
–Neural Information Processing Systems
Knowledge Distillation (KD) aims at transferring the knowledge of a well-performed neural network (the teacher) to a weaker one (the student).
Neural Information Processing Systems
Oct-2-2025, 16:08:02 GMT
- Country:
- Asia > China
- Beijing > Beijing (0.04)
- Jiangsu Province > Nanjing (0.04)
- Ningxia Hui Autonomous Region > Yinchuan (0.04)
- Asia > China
- Genre:
- Research Report > New Finding (0.67)
- Industry:
- Education (1.00)
- Technology: