Detecting anomalous inputs is critical for safely deploying deep learning models in the real world. Existing approaches for detecting out-of-distribution (OOD) examples work well when evaluated on natural samples drawn from a sufficiently different distribution than the training data distribution. However, in this paper, we show that existing detection mechanisms can be extremely brittle when evaluating on inputs with minimal adversarial perturbations which don't change their semantics. Formally, we introduce a novel and challenging problem, Robust Out-of-Distribution Detection, and propose an algorithm that can fool existing OOD detectors by adding small perturbations to the inputs while preserving their semantics and thus the distributional membership. We take a first step to solve this challenge, and propose an effective algorithm called ALOE, which performs robust training by exposing the model to both adversarially crafted inlier and outlier examples. Our method can be flexibly combined with, and render existing methods robust. On common benchmark datasets, we show that ALOE substantially improves the robustness of state-of-the-art OOD detection, with 58.4% AUROC improvement on CIFAR-10 and 46.59% improvement on CIFAR-100. Finally, we provide theoretical analysis for our method, underpinning the empirical results above.
The effective application of neural networks in the real-world relies on proficiently detecting out-of-distribution examples. Contemporary methods seek to model the distribution of feature activations in the training data for adequately distinguishing abnormalities, and the state-of-the-art method uses Gaussian distribution models. In this work, we present a novel approach that improves upon the state-of-the-art by leveraging an expressive density model based on normalizing flows. We introduce the residual flow, a novel flow architecture that learns the residual distribution from a base Gaussian distribution. Our model is general, and can be applied to any data that is approximately Gaussian. For novelty detection in image datasets, our approach provides a principled improvement over the state-of-the-art. Specifically, we demonstrate the effectiveness of our method in ResNet and DenseNet architectures trained on various image datasets. For example, on a ResNet trained on CIFAR-100 and evaluated on detection of out-of-distribution samples from the ImageNet dataset, holding the true positive rate (TPR) at $95\%$, we improve the true negative rate (TNR) from $56.7\%$ (current state-of-the-art) to $77.5\%$ (ours).
Recently, many methods to reduce neural networks uncertainty have been proposed. However, most of the techniques used in these solutions usually present severe drawbacks. In this paper, we argue that neural networks low out-of-distribution detection performance is mainly due to the SoftMax loss anisotropy. Therefore, we built an isotropic loss to reduce neural networks uncertainty in a fast, scalable, turnkey, and native approach. Our experiments show that replacing SoftMax with the proposed loss does not affect classification accuracy. Moreover, our proposal overcomes ODIN typically by a large margin while producing usually competitive results against a state-of-the-art Mahalanobis method despite avoiding their limitations. Hence, neural networks uncertainty may be significantly reduced by a simple loss change without relying on special procedures such as data augmentation, adversarial training/validation, ensembles, or additional classification/regression models.
The reliable measurement of confidence in classifiers' predictions is very important for many applications and is, therefore, an important part of classifier design. Yet, although deep learning has received tremendous attention in recent years, not much progress has been made in quantifying the prediction confidence of neural network classifiers. Bayesian models offer a mathematically grounded framework to reason about model uncertainty, but usually come with prohibitive computational costs. In this paper we propose a simple, scalable method to achieve a reliable confidence score, based on the data embedding derived from the penultimate layer of the network. We investigate two ways to achieve desirable embeddings, by using either a distance-based loss or Adversarial Training. We then test the benefits of our method when used for classification error prediction, weighting an ensemble of classifiers, and novelty detection. In all tasks we show significant improvement over traditional, commonly used confidence scores.
Deep learning models are known to be overconfident in their predictions on out of distribution inputs. This is a challenge when a model is trained on a particular input dataset, but receives out of sample data when deployed in practice. Recently, there has been work on building classifiers that are robust to out of distribution samples by adding a regularization term that maximizes the entropy of the classifier output on out of distribution data. However, given the challenge that it is not always possible to obtain out of distribution samples, the authors suggest a GAN based alternative that is independent of specific knowledge of out of distribution samples. From this existing work, we also know that having access to the true out of sample distribution for regularization works significantly better than using samples from the GAN. In this paper, we make the following observation: in practice, the out of distribution samples are contained in the traffic that hits a deployed classifier. However, the traffic will also contain a unknown proportion of in-distribution samples. If the entropy over of all of the traffic data were to be naively maximized, this will hurt the classifier performance on in-distribution data. To effectively leverage this traffic data, we propose an adaptive regularization technique (based on the maximum predictive probability score of a sample) which penalizes out of distribution samples more heavily than in distribution samples in the incoming traffic. This ensures that the overall performance of the classifier does not degrade on in-distribution data, while detection of out-of-distribution samples is significantly improved by leveraging the unlabeled traffic data. We show the effectiveness of our method via experiments on natural image datasets.