deterministic deep learning
Simple and Principled Uncertainty Estimation with Deterministic Deep Learning via Distance Awareness
Bayesian neural networks (BNN) and deep ensembles are principled approaches to estimate the predictive uncertainty of a deep learning model. However their practicality in real-time, industrial-scale applications are limited due to their heavy memory and inference cost. This motivates us to study principled approaches to high-quality uncertainty estimation that require only a single deep neural network (DNN). By formalizing the uncertainty quantification as a minimax learning problem, we first identify input distance awareness, i.e., the model's ability to quantify the distance of a testing example from the training data in the input space, as a necessary condition for a DNN to achieve high-quality (i.e., minimax optimal) uncertainty estimation. We then propose Spectral-normalized Neural Gaussian Process (SNGP), a simple method that improves the distance-awareness ability of modern DNNs, by adding a weight normalization step during training and replacing the output layer. On a suite of vision and language understanding tasks and on modern architectures (Wide-ResNet and BERT), SNGP is competitive with deep ensembles in prediction, calibration and out-of-domain detection, and outperforms the other single-model approaches.
Review for NeurIPS paper: Simple and Principled Uncertainty Estimation with Deterministic Deep Learning via Distance Awareness
Weaknesses: After reading the paper, it is still arguable why the distance-awareness is decisive for determining the uncertainty. Distance awareness is the property of the conventional local methods such as those using kernels. The experiment for the synthetic data will be reconstructed with conventional Gaussian Processes without neural networks. The effect of the replacement of the last layer seems obvious regarding the distance awareness, but it is unclear whether this distance awareness property is indeed advantageous. From this perspective, the theoretical property in Equation (6) is the property of conventional local methods.
Review for NeurIPS paper: Simple and Principled Uncertainty Estimation with Deterministic Deep Learning via Distance Awareness
However, a knowledgeable reviewer (R4) issued a clear reject. The ensuing discussion over the reason of the reject shows that the meta-reviewer agrees with the concerns of R4, but that the debate this paper triggers may make it worth publishing. This paper offers two clearly distinct algorithms: - one based on Gaussian Processes (GP) builds a loss where the distance between an example and the training data in the last hidden layer is taken into account for OOD modelling - one based on Spectral Norm (SN) better ties the distance in the hidden space to the input space distance. This is justified by Lipschitz bounds that seem very loose. The objections raised by R4, but also hinted by other reviewers are serious: in a deep learning architecture, as the input data lives in a low dimensional manifold, there is no reason for a distance that is not aware of this manifold to be meaningful (except locally as shown for adversarial learning).
Simple and Principled Uncertainty Estimation with Deterministic Deep Learning via Distance Awareness
Bayesian neural networks (BNN) and deep ensembles are principled approaches to estimate the predictive uncertainty of a deep learning model. However their practicality in real-time, industrial-scale applications are limited due to their heavy memory and inference cost. This motivates us to study principled approaches to high-quality uncertainty estimation that require only a single deep neural network (DNN). By formalizing the uncertainty quantification as a minimax learning problem, we first identify input distance awareness, i.e., the model's ability to quantify the distance of a testing example from the training data in the input space, as a necessary condition for a DNN to achieve high-quality (i.e., minimax optimal) uncertainty estimation. We then propose Spectral-normalized Neural Gaussian Process (SNGP), a simple method that improves the distance-awareness ability of modern DNNs, by adding a weight normalization step during training and replacing the output layer. On a suite of vision and language understanding tasks and on modern architectures (Wide-ResNet and BERT), SNGP is competitive with deep ensembles in prediction, calibration and out-of-domain detection, and outperforms the other single-model approaches.