Asymmetric Temperature Scaling Makes Larger Networks Teach Well Again
–Neural Information Processing Systems
Knowledge Distillation (KD) aims at transferring the knowledge of a well-performed neural network (the teacher) to a weaker one (the student).
Neural Information Processing Systems
Oct-2-2025, 16:07:58 GMT
- Country:
- Asia > China
- Beijing > Beijing (0.04)
- Jiangsu Province > Nanjing (0.04)
- Ningxia Hui Autonomous Region > Yinchuan (0.04)
- Asia > China
- Industry:
- Education (0.93)
- Technology: