Bayesian Learning
Quantification of Uncertainties in Probabilistic Deep Neural Network by Implementing Boosting of Variational Inference
Modern neural network architectures have achieved remarkable accuracies but remain highly dependent on their training data, often lacking interpretability in their learned mappings. While effective on large datasets, they tend to overfit on smaller ones. Probabilistic neural networks, such as those utilizing variational inference, address this limitation by incorporating uncertainty estimation through weight distributions rather than point estimates. However, standard variational inference often relies on a single-density approximation, which can lead to poor posterior estimates and hinder model performance. We propose Boosted Bayesian Neural Networks (BBNN), a novel approach that enhances neural network weight distribution approximations using Boosting Variational Inference (BVI). By iteratively constructing a mixture of densities, BVI expands the approximating family, enabling a more expressive posterior that leads to improved generalization and uncertainty estimation. While this approach increases computational complexity, it significantly enhances accuracy an essential tradeoff, particularly in high-stakes applications such as medical diagnostics, where false negatives can have severe consequences. Our experimental results demonstrate that BBNN achieves ~5% higher accuracy compared to conventional neural networks while providing superior uncertainty quantification. This improvement highlights the effectiveness of leveraging a mixture-based variational family to better approximate the posterior distribution, ultimately advancing probabilistic deep learning.
On the Precise Asymptotics of Universal Inference
Traditional statistical inference techniques, such as likelihood ratio tests, have seen renewed interest in recent years, driven in part by the growing emphasis on methodologies based on e-values and e-processes, rather than conventional p-values. Unlike p-values, e-values possess several properties that make them particularly appealing for modern data science applications. In particular, e-value-based methods have played an instrumental role in advancing multiple and safe testing (Grünwald et al., 2020; Vovk and Wang, 2021; Shafer, 2021; Wang and Ramdas, 2022), anytime-valid inference (Waudby-Smith and Ramdas, 2024), and asymptotic confidence sequences (Waudby-Smith et al., 2024). This list is far from exhaustive, and we refer to Ramdas et al. (2023) for a broader overview of recent developments. This manuscript revisits the work of Wasserman et al. (2020), who introduced universal inference, a general hypothesis testing framework based on split likelihood ratio statistics, which is also an e-value. This framework provides simple procedures for many complex composite testing problems that previously lacked actionable solutions, such as testing logconcavity (Dunn et al., 2024) and causal inference under unknown causal structures (Strieder et al., 2021), among others. Specifically, universal inference combines the classical idea of sample splitting (Cox, 1975) and Markov's inequality to establish finite-sample validity. The procedure follows three steps.
Conformal Prediction and Human Decision Making
Hullman, Jessica, Wu, Yifan, Xie, Dawei, Guo, Ziyang, Gelman, Andrew
Methods to quantify uncertainty in predictions from arbitrary models are in demand in high-stakes domains like medicine and finance. Conformal prediction has emerged as a popular method for producing a set of predictions with specified average coverage, in place of a single prediction and confidence value. However, the value of conformal prediction sets to assist human decisions remains elusive due to the murky relationship between coverage guarantees and decision makers' goals and strategies. How should we think about conformal prediction sets as a form of decision support? We outline a decision theoretic framework for evaluating predictive uncertainty as informative signals, then contrast what can be said within this framework about idealized use of calibrated probabilities versus conformal prediction sets. Informed by prior empirical results and theories of human decisions under uncertainty, we formalize a set of possible strategies by which a decision maker might use a prediction set. We identify ways in which conformal prediction sets and posthoc predictive uncertainty quantification more broadly are in tension with common goals and needs in human-AI decision making. We give recommendations for future research in predictive uncertainty quantification to support human decision makers.
Deep Learning Advancements in Anomaly Detection: A Comprehensive Survey
Huang, Haoqi, Wang, Ping, Pei, Jianhua, Wang, Jiacheng, Alexanian, Shahen, Niyato, Dusit
The rapid expansion of data from diverse sources has made anomaly detection (AD) increasingly essential for identifying unexpected observations that may signal system failures, security breaches, or fraud. As datasets become more complex and high-dimensional, traditional detection methods struggle to effectively capture intricate patterns. Advances in deep learning have made AD methods more powerful and adaptable, improving their ability to handle high-dimensional and unstructured data. This survey provides a comprehensive review of over 180 recent studies, focusing on deep learning-based AD techniques. We categorize and analyze these methods into reconstruction-based and prediction-based approaches, highlighting their effectiveness in modeling complex data distributions. Additionally, we explore the integration of traditional and deep learning methods, highlighting how hybrid approaches combine the interpretability of traditional techniques with the flexibility of deep learning to enhance detection accuracy and model transparency. Finally, we identify open issues and propose future research directions to advance the field of AD. This review bridges gaps in existing literature and serves as a valuable resource for researchers and practitioners seeking to enhance AD techniques using deep learning.
An Analysis of Safety Guarantees in Multi-Task Bayesian Optimization
Luebsen, Jannis O., Eichler, Annika
--This paper addresses the integration of additional information sources into a Bayesian optimization framework while ensuring that safety constraints are satisfied. The interdependencies between these information sources are modeled using an unknown correlation matrix. We explore how uniform error bounds must be adjusted to maintain constraint satisfaction throughout the optimization process, considering both Bayesian and frequentist statistical perspectives. This is achieved by appropriately scaling the error bounds based on a confidence interval that can be estimated from the data. Furthermore, the efficacy of the proposed approach is demonstrated through experiments on two benchmark functions and a controller parameter optimization problem. Our results highlight a significant improvement in sample efficiency, demonstrating the method's suitability for optimizing expensive-to-evaluate functions. Many practical optimization problems can be formulated as the optimization of a black-box function, e. g., because of their complex underlying physics or the requirement of impractical identification processes. Black-box optimization algorithms bypass the need of models for optimizations. In essence, these algorithms sequentially evaluate the black-box function for some input while reducing the cost. In the last decade, Bayesian optimization (BO) has emerged as a promising method for solving exactly this set of problems. This method involves constructing a probabilistic surrogate model of an arbitrary objective function with minimal assumptions. The utilization of Gaussian processes (GPs) enables the incorporation of prior knowledge about the objective function, making BO particularly well-suited for scenarios where function evaluations are costly and observations may be noisy. As a simple example of BO, consider the optimization of a PID controller for unit step reference tracking, where the plant dynamics are unknown. A potential cost function that measures tracking accuracy could be the mean-squared error of the plant output and the step reference for a designated time window. The black-box function is now the function that maps the PID parameters to the image of the cost function. An evaluation corresponds to running the step response of the system with the specified PID parameters.
Robust Decision-Making Via Free Energy Minimization
Shafiei, Allahkaram, Jesawada, Hozefa, Friston, Karl, Russo, Giovanni
Despite their groundbreaking performance, state-of-the-art autonomous agents can misbehave when training and environmental conditions become inconsistent, with minor mismatches leading to undesirable behaviors or even catastrophic failures. Robustness towards these training/environment ambiguities is a core requirement for intelligent agents and its fulfillment is a long-standing challenge when deploying agents in the real world. Here, departing from mainstream views seeking robustness through training, we introduce DR-FREE, a free energy model that installs this core property by design. It directly wires robustness into the agent decision-making mechanisms via free energy minimization. By combining a robust extension of the free energy principle with a novel resolution engine, DR-FREE returns a policy that is optimal-yet-robust against ambiguity. Moreover, for the first time, it reveals the mechanistic role of ambiguity on optimal decisions and requisite Bayesian belief updating. We evaluate DR-FREE on an experimental testbed involving real rovers navigating an ambiguous environment filled with obstacles. Across all the experiments, DR-FREE enables robots to successfully navigate towards their goal even when, in contrast, standard free energy minimizing agents that do not use DR-FREE fail. In short, DR-FREE can tackle scenarios that elude previous methods: this milestone may inspire both deployment in multi-agent settings and, at a perhaps deeper level, the quest for a biologically plausible explanation of how natural agents - with little or no training - survive in capricious environments.
Rendering Transparency to Ranking in Educational Assessment via Bayesian Comparative Judgement
Gray, Andy, Rahat, Alma, Lindsay, Stephen, Pearson, Jen, Crick, Tom
Ensuring transparency in educational assessment is increasingly critical, particularly post-pandemic, as demand grows for fairer and more reliable evaluation methods. Comparative Judgement (CJ) offers a promising alternative to traditional assessments, yet concerns remain about its perceived opacity. This paper examines how Bayesian Comparative Judgement (BCJ) enhances transparency by integrating prior information into the judgement process, providing a structured, data-driven approach that improves interpretability and accountability. BCJ assigns probabilities to judgement outcomes, offering quantifiable measures of uncertainty and deeper insights into decision confidence. By systematically tracking how prior data and successive judgements inform final rankings, BCJ clarifies the assessment process and helps identify assessor disagreements. Multi-criteria BCJ extends this by evaluating multiple learning outcomes (LOs) independently, preserving the richness of CJ while producing transparent, granular rankings aligned with specific assessment goals. It also enables a holistic ranking derived from individual LOs, ensuring comprehensive evaluations without compromising detailed feedback. Using a real higher education dataset with professional markers in the UK, we demonstrate BCJ's quantitative rigour and ability to clarify ranking rationales. Through qualitative analysis and discussions with experienced CJ practitioners, we explore its effectiveness in contexts where transparency is crucial, such as high-stakes national assessments. We highlight the benefits and limitations of BCJ, offering insights into its real-world application across various educational settings.
Modelling Child Learning and Parsing of Long-range Syntactic Dependencies
Mahon, Louis, Johnson, Mark, Steedman, Mark
This work develops a probabilistic child language acquisition model to learn a range of linguistic phenonmena, most notably long-range syntactic dependencies of the sort found in object wh-questions, among other constructions. The model is trained on a corpus of real child-directed speech, where each utterance is paired with a logical form as a meaning representation. It then learns both word meanings and language-specific syntax simultaneously. After training, the model can deduce the correct parse tree and word meanings for a given utterance-meaning pair, and can infer the meaning if given only the utterance. The successful modelling of long-range dependencies is theoretically important because it exploits aspects of the model that are, in general, trans-context-free.
Does the Appearance of Autonomous Conversational Robots Affect User Spoken Behaviors in Real-World Conference Interactions?
Pang, Zi Haur, Fu, Yahui, Lala, Divesh, Elmers, Mikey, Inoue, Koji, Kawahara, Tatsuya
We investigate the impact of robot appearance on users' spoken behavior during real-world interactions by comparing a human-like android, ERICA, with a less anthropomorphic humanoid, TELECO. Analyzing data from 42 participants at SIGDIAL 2024, we extracted linguistic features such as disfluencies and syntactic complexity from conversation transcripts. The results showed moderate effect sizes, suggesting that participants produced fewer disfluencies and employed more complex syntax when interacting with ERICA. Further analysis involving training classification models like Na\"ive Bayes, which achieved an F1-score of 71.60\%, and conducting feature importance analysis, highlighted the significant role of disfluencies and syntactic complexity in interactions with robots of varying human-like appearances. Discussing these findings within the frameworks of cognitive load and Communication Accommodation Theory, we conclude that designing robots to elicit more structured and fluent user speech can enhance their communicative alignment with humans.
Do you understand epistemic uncertainty? Think again! Rigorous frequentist epistemic uncertainty estimation in regression
Foglia, Enrico, Bobbia, Benjamin, Durasov, Nikita, Bauerheim, Michael, Fua, Pascal, Moreau, Stephane, Jardin, Thierry
Quantifying model uncertainty is critical for understanding prediction reliability, yet distinguishing between aleatoric and epistemic uncertainty remains challenging. We extend recent work from classification to regression to provide a novel frequentist approach to epistemic and aleatoric uncertainty estimation. We train models to generate conditional predictions by feeding their initial output back as an additional input. This method allows for a rigorous measurement of model uncertainty by observing how prediction responses change when conditioned on the model's previous answer. We provide a complete theoretical framework to analyze epistemic uncertainty in regression in a frequentist way, and explain how it can be exploited in practice to gauge a model's uncertainty, with minimal changes to the original architecture.