Reviews: A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks

Neural Information Processing Systems 

The problem of the study is detecting abnormalities within deep neural networks, to detect out-of-distribution inputs, adversarial inputs, and new classes (for class incremental learning). To achieve this, the authors integrate class-conditional Gaussian distributions with a tied covariance (linear discriminant analysis) at various stages of a target neural network and construct distributions over the valid input (in-liers). They use the Mahalanobis distance measure of the Gaussian distribution as a confidence measure (proportional to the log-likelihood). They further enhance the confidence measure by taking Fast Gradient-Sign Method-style steps in the input space to increase the score. Finally, they combine the scores gathered at different layers of the neural network through a linear combination.