lsun
Single Layer Predictive Normalized Maximum Likelihood for Out-of-Distribution Detection-Supplementary material-Anonymous Author(s) Affiliation Address email
We use the same notations as in section 4.2 Denote ec as a one-hot row vector of the true label, we define the hypothesis set that genie is allowed3 to choose from as4 Pฮ = pฮธ(y|x) = 1 2ฯฯ2 exp 1 2ฯ2 y f(x>nฮธ) e>c We simulate the response of the pNML regret for two classes (C=2) and divide it by logC to have11 the regret bounded between 0 and 1. Figure 1 shows the regret behaviour for different p1 (the ERM12 probability assignment of class 1) as a function of x>g.13 For an ERM model that is certain on the prediction (p1 = 0.99 that is represented by the purple14 curve), a slight variation of x>g causes a large response of the regret comparing to p1 that equals15 0.55 and 0.85. Next, 20 we compute the correlation matrix of the training embeddings and perform an SVD decomposition. For the SVHN training set, most of the energy is located in the first 50 eigenvalues and then 24 there is a significant decrease of approximately 103. The same phenomenon is also seen in figure 2a 25 that shows the eigenvalues of ResNet-40 model.
Single Layer Predictive Normalized Maximum Likelihood for Out-of-Distribution Detection
Detecting out-of-distribution (OOD) samples is vital for developing machine learning based models for critical safety systems. Common approaches for OOD detection assume access to some OOD samples during training which may not be available in a real-life scenario. Instead, we utilize the predictive normalized maximum likelihood (pNML) learner, in which no assumptions are made on the tested input. We derive an explicit expression of the pNML and its generalization error, denoted as the regret, for a single layer neural network (NN). We show that this learner generalizes well when (i) the test vector resides in a subspace spanned by the eigenvectors associated with the large eigenvalues of the empirical correlation matrix of the training data, or (ii) the test sample is far from the decision boundary. Furthermore, we describe how to efficiently apply the derived pNML regret to any pretrained deep NN, by employing the explicit pNML for the last layer, followed by the softmax function. Applying the derived regret to deep NN requires neither additional tunable parameters nor extra data. We extensively evaluate our approach on 74 OOD detection benchmarks using DenseNet-100, ResNet-34, and WideResNet40 models trained with CIFAR-100, CIFAR-10, SVHN, and ImageNet-30 showing a significant improvement of up to 15.6% over recent leading methods.
Quantifying the Prediction Uncertainty of Machine Learning Models for Individual Data
Machine learning models have exhibited exceptional results in various domains. The most prevalent approach for learning is the empirical risk minimizer (ERM), which adapts the model's weights to reduce the loss on a training set and subsequently leverages these weights to predict the label for new test data. Nonetheless, ERM makes the assumption that the test distribution is similar to the training distribution, which may not always hold in real-world situations. In contrast, the predictive normalized maximum likelihood (pNML) was proposed as a min-max solution for the individual setting where no assumptions are made on the distribution of the tested input. This study investigates pNML's learnability for linear regression and neural networks, and demonstrates that pNML can improve the performance and robustness of these models on various tasks. Moreover, the pNML provides an accurate confidence measure for its output, showcasing state-of-the-art results for out-of-distribution detection, resistance to adversarial attacks, and active learning.
Reviews: Memory Replay GANs: Learning to Generate New Categories without Forgetting
Update following the author rebuttal: I would like to thank the authors for their thoughtful rebuttal. I feel like they appropriately addressed the main points I raised, namely the incomplete evaluation and the choice of GANs over other generative model families, and I'm inclined to recommend the paper's acceptance. I updated my review score accordingly. The paper is well-written and its exposition of the problem, proposed solution, and related work is clear. Starting from the AC-GAN conditional generative modeling formulation, the authors introduce the notion of a sequence of tasks by modeling image classes (for MNIST, SVHN, and LSUN) in sequence, where the model for each class in the sequence is initialized with the model parameters for the previous class in the sequence.
Strategies and impact of learning curve estimation for CNN-based image classification
Didyk, Laura, Yarish, Brayden, Beck, Michael A., Bidinosti, Christopher P., Henry, Christopher J.
Learning curves are a measure for how the performance of machine learning models improves given a certain volume of training data. Over a wide variety of applications and models it was observed that learning curves follow -- to a large extent -- a power law behavior. This makes the performance of different models for a given task somewhat predictable and opens the opportunity to reduce the training time for practitioners, who are exploring the space of possible models and hyperparameters for the problem at hand. By estimating the learning curve of a model from training on small subsets of data only the best models need to be considered for training on the full dataset. How to choose subset sizes and how often to sample models on these to obtain estimates is however not researched. Given that the goal is to reduce overall training time strategies are needed that sample the performance in a time-efficient way and yet leads to accurate learning curve estimates. In this paper we formulate the framework for these strategies and propose several strategies. Further we evaluate the strategies for simulated learning curves and in experiments with popular datasets and models for image classification tasks.
Out of Distribution Detection via Neural Network Anchoring
Anirudh, Rushil, Thiagarajan, Jayaraman J.
Our goal in this paper is to exploit heteroscedastic temperature scaling as a calibration strategy for out of distribution (OOD) detection. Heteroscedasticity here refers to the fact that the optimal temperature parameter for each sample can be different, as opposed to conventional approaches that use the same value for the entire distribution. To enable this, we propose a new training strategy called anchoring that can estimate appropriate temperature values for each sample, leading to state-of-the-art OOD detection performance across several benchmarks. Using NTK theory, we show that this temperature function estimate is closely linked to the epistemic uncertainty of the classifier, which explains its behavior. In contrast to some of the best-performing OOD detection approaches, our method does not require exposure to additional outlier datasets, custom calibration objectives, or model ensembling. Through empirical studies with different OOD detection settings -- far OOD, near OOD, and semantically coherent OOD - we establish a highly effective OOD detection approach. Code to reproduce our results is available at github.com/LLNL/AMP