mixup model
36ad8b5f42db492827016448975cc22d-AuthorFeedback.pdf
We thank the reviewers for their comments and actionable suggestions on improving the paper. We paraphrase some of the comments for brevity. We provide additional results on ResNet-18 for both CIFAR-10 and 100. Y our intuition is correct: for the baseline (i.e when's), the strong data augmentation prevents We will include a discussion on ROC and AUC curves for mixup in the final version (R1).
On the Limitations of Temperature Scaling for Distributions with Overlaps
Despite the impressive generalization capabilities of deep neural networks, they have been repeatedly shown to be overconfident when they are wrong. Fixing this issue is known as model calibration, and has consequently received much attention in the form of modified training schemes and post-training calibration procedures such as temperature scaling. While temperature scaling is frequently used because of its simplicity, it is often outperformed by modified training schemes. In this work, we identify a specific bottleneck for the performance of temperature scaling. We show that for empirical risk minimizers for a general set of distributions in which the supports of classes have overlaps, the performance of temperature scaling degrades with the amount of overlap between classes, and asymptotically becomes no better than random when there are a large number of classes. On the other hand, we prove that optimizing a modified form of the empirical risk induced by the Mixup data augmentation technique can in fact lead to reasonably good calibration performance, showing that training-time calibration may be necessary in some situations. We also verify that our theoretical results reflect practice by showing that Mixup significantly outperforms empirical risk minimization (with respect to multiple calibration metrics) on image classification benchmarks with class overlaps introduced in the form of label noise. The past decade has seen a rapid increase in the prevalence of deep learning models across a variety of applications, in large part due to their impressive predictive accuracy on unseen test data.
- North America > United States > California > San Diego County > San Diego (0.04)
- Europe > France (0.04)
mixup: Beyond Empirical Risk Minimization
Zhang, Hongyi, Cisse, Moustapha, Dauphin, Yann N., Lopez-Paz, David
Large deep neural networks are powerful, but exhibit undesirable behaviors such as memorization and sensitivity to adversarial examples. In this work, we propose mixup, a simple learning principle to alleviate these issues. In essence, mixup trains a neural network on convex combinations of pairs of examples and their labels. By doing so, mixup regularizes the neural network to favor simple linear behavior in-between training examples. Our experiments on the ImageNet-2012, CIFAR-10, CIFAR-100, Google commands and UCI datasets show that mixup improves the generalization of state-of-the-art neural network architectures. We also find that mixup reduces the memorization of corrupt labels, increases the robustness to adversarial examples, and stabilizes the training of generative adversarial networks.