Monotonic Risk Relationships under Distribution Shifts for Regularized Risk Minimization
LeJeune, Daniel, Liu, Jiayu, Heckel, Reinhard
–arXiv.org Artificial Intelligence
Machine learning models are typically evaluated by shuffling a set of labeled data, splitting it into training and test sets, and evaluating the model trained on the training set on the test set. This measures how well the model performs on the distribution the model was trained on. However, in practice a model is most commonly not applied to such in-distribution data, but rather to outof-distribution data that is almost always at least slightly different. In order to understand the performance of machine learning methods in practice, it is therefore important to understand how out-of-distribution performance relates to in-distribution performance. While there are settings in which models with similar in-distribution performance have different out-of-distribution performance (McCoy et al., 2020), a series of recent empirical studies have shown that often, the in-distribution and out-of-distribution performances of models are strongly correlated: Recht et al. (2019), Yadav and Bottou (2019), and Miller et al. (2020) constructed new test sets for the popular CIFAR-10, ImageNet, and MNIST image classification problems and for the SQuAD question answering datasets by following the original data collection and labeling process as closely as possible. For CIFAR-10 and ImageNet the performance drops significantly when evaluated on the new test set, indicating that even when following the original data collection and labeling process, a significant distribution shift can occur. In addition, for all four distribution shifts, the in-and out-of-distribution errors are strongly linearly correlated.
arXiv.org Artificial Intelligence
Jul-20-2023