quantifying uncertainty
Conformal Prediction on Quantifying Uncertainty of Dynamic Systems
Liang, Aoming, Liu, Qi, Xu, Lei, Sohrab, Fahad, Cui, Weicheng, Song, Changhui, Gabbouj, Moncef
Numerous studies have focused on learning and understanding the dynamics of physical systems from video data, such as spatial intelligence. Artificial intelligence requires quantitative assessments of the uncertainty of the model to ensure reliability. However, there is still a relative lack of systematic assessment of the uncertainties, particularly the uncertainties of the physical data. Our motivation is to introduce conformal prediction into the uncertainty assessment of dynamical systems, providing a method supported by theoretical guarantees. This paper uses the conformal prediction method to assess uncertainties with benchmark operator learning methods. We have also compared the Monte Carlo Dropout and Ensemble methods in the partial differential equations dataset, effectively evaluating uncertainty through straight roll-outs, making it ideal for time-series tasks.
QUITE: Quantifying Uncertainty in Natural Language Text in Bayesian Reasoning Scenarios
Schrader, Timo Pierre, Lange, Lukas, Razniewski, Simon, Friedrich, Annemarie
Reasoning is key to many decision making processes. It requires consolidating a set of rule-like premises that are often associated with degrees of uncertainty and observations to draw conclusions. In this work, we address both the case where premises are specified as numeric probabilistic rules and situations in which humans state their estimates using words expressing degrees of certainty. Existing probabilistic reasoning datasets simplify the task, e.g., by requiring the model to only rank textual alternatives, by including only binary random variables, or by making use of a limited set of templates that result in less varied text. In this work, we present QUITE, a question answering dataset of real-world Bayesian reasoning scenarios with categorical random variables and complex relationships. QUITE provides high-quality natural language verbalizations of premises together with evidence statements and expects the answer to a question in the form of an estimated probability. We conduct an extensive set of experiments, finding that logic-based models outperform out-of-the-box large language models on all reasoning types (causal, evidential, and explaining-away). Our results provide evidence that neuro-symbolic models are a promising direction for improving complex reasoning. We release QUITE and code for training and experiments on Github.
Towards Reproducible LLM Evaluation: Quantifying Uncertainty in LLM Benchmark Scores
Blackwell, Robert E., Barry, Jon, Cohn, Anthony G.
Large language models (LLMs) are stochastic, and not all models give deterministic answers, even when setting temperature to zero with a fixed random seed. However, few benchmark studies attempt to quantify uncertainty, partly due to the time and cost of repeated experiments. We use benchmarks designed for testing LLMs' capacity to reason about cardinal directions to explore the impact of experimental repeats on mean score and prediction interval. We suggest a simple method for cost-effectively quantifying the uncertainty of a benchmark score and make recommendations concerning reproducible LLM evaluation.
QUTE: Quantifying Uncertainty in TinyML models with Early-exit-assisted ensembles
Ghanathe, Nikhil P, Wilton, Steve
Existing methods for uncertainty quantification incur massive memory and compute overhead, often requiring multiple models/inferences. Hence they are impractical on ultra-low-power KB-sized TinyML devices. To reduce overhead, prior works have proposed the use of early-exit networks as ensembles to quantify uncertainty in a single forward-pass. However, they still have a prohibitive cost for tinyML. To address these challenges, we propose QUTE, a novel resource-efficient early-exit-assisted ensemble architecture optimized for tinyML models. QUTE adds additional output blocks at the final exit of the base network and distills the knowledge of early-exits into these blocks to create a diverse and lightweight ensemble architecture. Our results show that QUTE outperforms popular prior works, and improves the quality of uncertainty estimates by 6% with 3.1x lower model size on average compared to the most relevant prior work. Furthermore, we demonstrate that QUTE is also effective in detecting co-variate shifted and out-of-distribution inputs, and shows competitive performance relative to G-ODIN, a state-of-the-art generalized OOD detector.
Quantifying Uncertainty in Motion Prediction with Variational Bayesian Mixture
Lu, Juanwu, Cui, Can, Ma, Yunsheng, Bera, Aniket, Wang, Ziran
Safety and robustness are crucial factors in developing trustworthy autonomous vehicles. One essential aspect of addressing these factors is to equip vehicles with the capability to predict future trajectories for all moving objects in the surroundings and quantify prediction uncertainties. In this paper, we propose the Sequential Neural Variational Agent (SeNeVA), a generative model that describes the distribution of future trajectories for a single moving object. Our approach can distinguish Out-of-Distribution data while quantifying uncertainty and achieving competitive performance compared to state-of-the-art methods on the Argoverse 2 and INTERACTION datasets. Specifically, a 0.446 meters minimum Final Displacement Error, a 0.203 meters minimum Average Displacement Error, and a 5.35% Miss Rate are achieved on the INTERACTION test set. Extensive qualitative and quantitative analysis is also provided to evaluate the proposed model. Our open-source code is available at https://github.com/PurdueDigitalTwin/seneva.
Quantifying Uncertainties of Contact Classifications in a Human-Robot Collaboration with Parallel Robots
Mohammad, Aran, Muscheid, Hendrik, Schappler, Moritz, Seel, Thomas
In human-robot collaboration, unintentional physical contacts occur in the form of collisions and clamping, which must be detected and classified separately for a reaction. If certain collision or clamping situations are misclassified, reactions might occur that make the true contact case more dangerous. This work analyzes data-driven modeling based on physically modeled features like estimated external forces for clamping and collision classification with a real parallel robot. The prediction reliability of a feedforward neural network is investigated. Quantification of the classification uncertainty enables the distinction between safe versus unreliable classifications and optimal reactions like a retraction movement for collisions, structure opening for the clamping joint, and a fallback reaction in the form of a zero-g mode. This hypothesis is tested with experimental data of clamping and collision cases by analyzing dangerous misclassifications and then reducing them by the proposed uncertainty quantification. Finally, it is investigated how the approach of this work influences correctly classified clamping and collision scenarios.
Quantifying Uncertainty In Traffic State Estimation Using Generative Adversarial Networks
Mo, Zhaobin, Fu, Yongjie, Di, Xuan
This paper aims to quantify uncertainty in traffic state estimation (TSE) using the generative adversarial network based physics-informed deep learning (PIDL). The uncertainty of the focus arises from fundamental diagrams, in other words, the mapping from traffic density to velocity. To quantify uncertainty for the TSE problem is to characterize the robustness of predicted traffic states. Since its inception, generative adversarial networks (GAN) have become a popular probabilistic machine learning framework. In this paper, we will inform the GAN based predictions using stochastic traffic flow models and develop a GAN based PIDL framework for TSE, named ``PhysGAN-TSE". By conducting experiments on a real-world dataset, the Next Generation SIMulation (NGSIM) dataset, this method is shown to be more robust for uncertainty quantification than the pure GAN model or pure traffic flow models. Two physics models, the Lighthill-Whitham-Richards (LWR) and the Aw-Rascle-Zhang (ARZ) models, are compared as the physics components for the PhysGAN, and results show that the ARZ-based PhysGAN achieves a better performance than the LWR-based one.
Quantifying Uncertainty with Probabilistic Machine Learning Modeling in Wireless Sensing
Kachroo, Amit, Chinnapalli, Sai Prashanth
The application of machine learning (ML) techniques in wireless communication domain has seen a tremendous growth over the years especially in the wireless sensing domain. However, the questions surrounding the ML model's inference reliability, and uncertainty associated with its predictions are never answered or communicated properly. This itself raises a lot of questions on the transparency of these ML systems. Developing ML systems with probabilistic modeling can solve this problem easily, where one can quantify uncertainty whether it is arising from the data (irreducible error or aleotoric uncertainty) or from the model itself (reducible or epistemic uncertainty). This paper describes the idea behind these types of uncertainty quantification in detail and uses a real example of WiFi channel state information (CSI) based sensing for motion/no-motion cases to demonstrate the uncertainty modeling. This work will serve as a template to model uncertainty in predictions not only for WiFi sensing but for most wireless sensing applications ranging from WiFi to millimeter wave radar based sensing that utilizes AI/ML models.
Discriminative Jackknife: Quantifying Uncertainty in Deep Learning via Higher-Order Influence Functions
Alaa, Ahmed M., van der Schaar, Mihaela
Deep learning models achieve high predictive accuracy across a broad spectrum of tasks, but rigorously quantifying their predictive uncertainty remains challenging. Usable estimates of predictive uncertainty should (1) cover the true prediction targets with high probability, and (2) discriminate between high- and low-confidence prediction instances. Existing methods for uncertainty quantification are based predominantly on Bayesian neural networks; these may fall short of (1) and (2) -- i.e., Bayesian credible intervals do not guarantee frequentist coverage, and approximate posterior inference undermines discriminative accuracy. In this paper, we develop the discriminative jackknife (DJ), a frequentist procedure that utilizes influence functions of a model's loss functional to construct a jackknife (or leave-one-out) estimator of predictive confidence intervals. The DJ satisfies (1) and (2), is applicable to a wide range of deep learning models, is easy to implement, and can be applied in a post-hoc fashion without interfering with model training or compromising its accuracy. Experiments demonstrate that DJ performs competitively compared to existing Bayesian and non-Bayesian regression baselines.
How Much Can I Trust You? -- Quantifying Uncertainties in Explaining Neural Networks
Bykov, Kirill, Höhne, Marina M. -C., Müller, Klaus-Robert, Nakajima, Shinichi, Kloft, Marius
Explainable AI (XAI) aims to provide interpretations for predictions made by learning machines, such as deep neural networks, in order to make the machines more transparent for the user and furthermore trustworthy also for applications in e.g. safety-critical areas. So far, however, no methods for quantifying uncertainties of explanations have been conceived, which is problematic in domains where a high confidence in explanations is a prerequisite. We therefore contribute by proposing a new framework that allows to convert any arbitrary explanation method for neural networks into an explanation method for Bayesian neural networks, with an in-built modeling of uncertainties. Within the Bayesian framework a network's weights follow a distribution that extends standard single explanation scores and heatmaps to distributions thereof, in this manner translating the intrinsic network model uncertainties into a quantification of explanation uncertainties. This allows us for the first time to carve out uncertainties associated with a model explanation and subsequently gauge the appropriate level of explanation confidence for a user (using percentiles). We demonstrate the effectiveness and usefulness of our approach extensively in various experiments, both qualitatively and quantitatively.