distance awareness
Simple and Principled Uncertainty Estimation with Deterministic Deep Learning via Distance Awareness
Bayesian neural networks (BNN) and deep ensembles are principled approaches to estimate the predictive uncertainty of a deep learning model. However their practicality in real-time, industrial-scale applications are limited due to their heavy memory and inference cost. This motivates us to study principled approaches to high-quality uncertainty estimation that require only a single deep neural network (DNN). By formalizing the uncertainty quantification as a minimax learning problem, we first identify input distance awareness, i.e., the model's ability to quantify the distance of a testing example from the training data in the input space, as a necessary condition for a DNN to achieve high-quality (i.e., minimax optimal) uncertainty estimation. We then propose Spectral-normalized Neural Gaussian Process (SNGP), a simple method that improves the distance-awareness ability of modern DNNs, by adding a weight normalization step during training and replacing the output layer. On a suite of vision and language understanding tasks and on modern architectures (Wide-ResNet and BERT), SNGP is competitive with deep ensembles in prediction, calibration and out-of-domain detection, and outperforms the other single-model approaches.
Review for NeurIPS paper: Simple and Principled Uncertainty Estimation with Deterministic Deep Learning via Distance Awareness
Weaknesses: After reading the paper, it is still arguable why the distance-awareness is decisive for determining the uncertainty. Distance awareness is the property of the conventional local methods such as those using kernels. The experiment for the synthetic data will be reconstructed with conventional Gaussian Processes without neural networks. The effect of the replacement of the last layer seems obvious regarding the distance awareness, but it is unclear whether this distance awareness property is indeed advantageous. From this perspective, the theoretical property in Equation (6) is the property of conventional local methods.
Review for NeurIPS paper: Simple and Principled Uncertainty Estimation with Deterministic Deep Learning via Distance Awareness
However, a knowledgeable reviewer (R4) issued a clear reject. The ensuing discussion over the reason of the reject shows that the meta-reviewer agrees with the concerns of R4, but that the debate this paper triggers may make it worth publishing. This paper offers two clearly distinct algorithms: - one based on Gaussian Processes (GP) builds a loss where the distance between an example and the training data in the last hidden layer is taken into account for OOD modelling - one based on Spectral Norm (SN) better ties the distance in the hidden space to the input space distance. This is justified by Lipschitz bounds that seem very loose. The objections raised by R4, but also hinted by other reviewers are serious: in a deep learning architecture, as the input data lives in a low dimensional manifold, there is no reason for a distance that is not aware of this manifold to be meaningful (except locally as shown for adversarial learning).
Simple and Principled Uncertainty Estimation with Deterministic Deep Learning via Distance Awareness
Bayesian neural networks (BNN) and deep ensembles are principled approaches to estimate the predictive uncertainty of a deep learning model. However their practicality in real-time, industrial-scale applications are limited due to their heavy memory and inference cost. This motivates us to study principled approaches to high-quality uncertainty estimation that require only a single deep neural network (DNN). By formalizing the uncertainty quantification as a minimax learning problem, we first identify input distance awareness, i.e., the model's ability to quantify the distance of a testing example from the training data in the input space, as a necessary condition for a DNN to achieve high-quality (i.e., minimax optimal) uncertainty estimation. We then propose Spectral-normalized Neural Gaussian Process (SNGP), a simple method that improves the distance-awareness ability of modern DNNs, by adding a weight normalization step during training and replacing the output layer. On a suite of vision and language understanding tasks and on modern architectures (Wide-ResNet and BERT), SNGP is competitive with deep ensembles in prediction, calibration and out-of-domain detection, and outperforms the other single-model approaches.
Density-Softmax: Scalable and Calibrated Uncertainty Estimation under Distribution Shifts
Prevalent deterministic deep-learning models suffer from significant over-confidence under distribution shifts. Probabilistic approaches can reduce this problem but struggle with computational efficiency. In this paper, we propose Density-Softmax, a fast and lightweight deterministic method to improve calibrated uncertainty estimation via a combination of density function with the softmax layer. By using the latent representation's likelihood value, our approach produces more uncertain predictions when test samples are distant from the training samples. Theoretically, we show that Density-Softmax can produce high-quality uncertainty estimation with neural networks, as it is the solution of minimax uncertainty risk and is distance-aware, thus reducing the over-confidence of the standard softmax. Empirically, our method enjoys similar computational efficiency as a single forward pass deterministic with standard softmax on the shifted toy, vision, and language datasets across modern deep-learning architectures. Notably, Density-Softmax uses 4 times fewer parameters than Deep Ensembles and 6 times lower latency than Rank-1 Bayesian Neural Network, while obtaining competitive predictive performance and lower calibration errors under distribution shifts.
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > Maryland > Baltimore (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- Europe > France (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.45)
A Simple Approach to Improve Single-Model Deep Uncertainty via Distance-Awareness
Liu, Jeremiah Zhe, Padhy, Shreyas, Ren, Jie, Lin, Zi, Wen, Yeming, Jerfel, Ghassen, Nado, Zack, Snoek, Jasper, Tran, Dustin, Lakshminarayanan, Balaji
Accurate uncertainty quantification is a major challenge in deep learning, as neural networks can make overconfident errors and assign high confidence predictions to out-of-distribution (OOD) inputs. The most popular approaches to estimate predictive uncertainty in deep learning are methods that combine predictions from multiple neural networks, such as Bayesian neural networks (BNNs) and deep ensembles. However their practicality in real-time, industrial-scale applications are limited due to the high memory and computational cost. Furthermore, ensembles and BNNs do not necessarily fix all the issues with the underlying member networks. In this work, we study principled approaches to improve the uncertainty property of a single network, based on a single, deterministic representation. By formalizing the uncertainty quantification as a minimax learning problem, we first identify distance awareness, i.e., the model's ability to quantify the distance of a testing example from the training data, as a necessary condition for a DNN to achieve highquality (i.e., minimax optimal) uncertainty estimation. We then propose Spectral-normalized Neural Gaussian Process (SNGP), a simple method that improves the distance-awareness ability of modern DNNs with two simple changes: (1) applying spectral normalization to hidden weights to enforce bi-Lipschitz smoothness in representations and (2) replacing the last output layer with a Gaussian process layer. On a suite of vision and language understanding benchmarks and on modern architectures (Wide-ResNet and BERT), SNGP consistently outperforms other single-model approaches in prediction, calibration and out-of-domain detection. Furthermore, SNGP provides complementary benefits to popular techniques such as deep ensembles and data augmentation, making it a simple and scalable building block for probabilistic deep learning.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Massachusetts (0.04)
- North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
- (7 more...)
- Education (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.67)
Distance-Aware DAG Embedding for Proximity Search on Heterogeneous Graphs
Liu, Zemin (Zhejiang University) | Zheng, Vincent W. ( Advanced Digital Sciences Center ) | Zhao, Zhou (Zhejiang University) | Zhu, Fanwei ( Zhejiang University City College ) | Chang, Kevin Chen-Chuan ( University of Illinois at Urbana-Champaign ) | Wu, Minghui ( Zhejiang University City College ) | Ying, Jing ( Zhejiang University )
Proximity search on heterogeneous graphs aims to measure the proximity between two nodes on a graph w.r.t. some semantic relation for ranking. Pioneer work often tries to measure such proximity by paths connecting the two nodes. However, paths as linear sequences have limited expressiveness for the complex network connections. In this paper, we explore a more expressive DAG (directed acyclic graph) data structure for modeling the connections between two nodes. Particularly, we are interested in learning a representation for the DAGs to encode the proximity between two nodes. We face two challenges to use DAGs, including how to efficiently generate DAGs and how to effectively learn DAG embedding for proximity search. We find distance-awareness as important for proximity search and the key to solve the above challenges. Thus we develop a novel Distance-aware DAG Embedding (D2AGE) model. We evaluate D2AGE on three benchmark data sets with six semantic relations, and we show that D2AGE outperforms the state-of-the-art baselines. We release the code on https://github.com/shuaiOKshuai.
- Europe > France (0.05)
- Asia > China (0.04)
- North America > United States > Illinois (0.04)
- Asia > Singapore (0.04)