Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification
Bai, Yu, Mei, Song, Wang, Huan, Xiong, Caiming
Modern machine learning models such as deep neural networks with high accuracy tend to be miscalibrated: The predicted top probability (confidence) does not reflect the actual accuracy of the model, and tends to be over-confident. For example, a WideResNet 32 on CIFAR100 has on average a predicted top probability of 87%, while the actual test accuracy is only 72% (Guo et al., 2017). As the confidence is often comprehended as an estimate of the true accuracy, such over-confidence could be dangerous, especially in risk-sensitive domains such as medical AI (Begoli et al., 2019), self-driving cars (Michelmore et al., 2018), and so on. To address this issue, there is a growing line of research on improving the calibration of models, by either performing recalibration of well-trained models to adjust the confidence scores (Platt et al., 1999; Zadrozny and Elkan; Naeini et al., 2015; Guo et al., 2017), or by averaging the predictions over multiple models to make the confidence scores more accurate (Lakshminarayanan et al., 2016; Gal and Ghahramani, 2016). These methods in general can reduce the over-confidence and improve the calibration of the model, while preserving (or even improving) the model's accuracy (Ovadia et al., 2019). Despite these progresses, the more fundamental question of why such over-confidence happens for vanillaly trained models remains not satisfactorily understood.
Feb-15-2021
- Country:
- North America > United States
- California > Alameda County > Berkeley (0.04)
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- North America > United States
- Genre:
- Research Report > New Finding (0.71)
- Industry:
- Information Technology > Robotics & Automation (0.34)
- Transportation > Ground
- Road (0.34)
- Technology: