Bayesian Inference
Continuous-in-time Limit for Bayesian Bandits
Zhu, Yuhua, Izzo, Zachary, Ying, Lexing
This paper revisits the bandit problem in the Bayesian setting. The Bayesian approach formulates the bandit problem as an optimization problem, and the goal is to find the optimal policy which minimizes the Bayesian regret. One of the main challenges facing the Bayesian approach is that computation of the optimal policy is often intractable, especially when the length of the problem horizon or the number of arms is large. In this paper, we first show that under a suitable rescaling, the Bayesian bandit problem converges toward a continuous Hamilton-Jacobi-Bellman (HJB) equation. The optimal policy for the limiting HJB equation can be explicitly obtained for several common bandit problems, and we give numerical methods to solve the HJB equation when an explicit solution is not available. Based on these results, we propose an approximate Bayes-optimal policy for solving Bayesian bandit problems with large horizons. Our method has the added benefit that its computational cost does not increase as the horizon increases.
Flexible and efficient spatial extremes emulation via variational autoencoders
Zhang, Likun, Ma, Xiaoyu, Wikle, Christopher K., Huser, Raphaรซl
Many real-world processes have complex tail dependence structures that cannot be characterized using classical Gaussian processes. More flexible spatial extremes models exhibit appealing extremal dependence properties but are often exceedingly prohibitive to fit and simulate from in high dimensions. In this paper, we develop a new spatial extremes model that has flexible and non-stationary dependence properties, and we integrate it in the encoding-decoding structure of a variational autoencoder (XVAE), whose parameters are estimated via variational Bayes combined with deep learning. The XVAE can be used as a spatio-temporal emulator that characterizes the distribution of potential mechanistic model output states and produces outputs that have the same statistical properties as the inputs, especially in the tail. As an aside, our approach also provides a novel way of making fast inference with complex extreme-value processes. Through extensive simulation studies, we show that our XVAE is substantially more time-efficient than traditional Bayesian inference while also outperforming many spatial extremes models with a stationary dependence structure. To further demonstrate the computational power of the XVAE, we analyze a high-resolution satellite-derived dataset of sea surface temperature in the Red Sea, which includes 30 years of daily measurements at 16703 grid cells. We find that the extremal dependence strength is weaker in the interior of Red Sea and it has decreased slightly over time.
A Primer on Bayesian Neural Networks: Review and Debates
Arbel, Julyan, Pitas, Konstantinos, Vladimirova, Mariia, Fortuin, Vincent
Neural networks have achieved remarkable performance across various problem domains, but their widespread applicability is hindered by inherent limitations such as overconfidence in predictions, lack of interpretability, and vulnerability to adversarial attacks. To address these challenges, Bayesian neural networks (BNNs) have emerged as a compelling extension of conventional neural networks, integrating uncertainty estimation into their predictive capabilities. This comprehensive primer presents a systematic introduction to the fundamental concepts of neural networks and Bayesian inference, elucidating their synergistic integration for the development of BNNs. The target audience comprises statisticians with a potential background in Bayesian methods but lacking deep learning expertise, as well as machine learners proficient in deep neural networks but with limited exposure to Bayesian statistics. We provide an overview of commonly employed priors, examining their impact on model behavior and performance. Additionally, we delve into the practical considerations associated with training and inference in BNNs. Furthermore, we explore advanced topics within the realm of BNN research, acknowledging the existence of ongoing debates and controversies. By offering insights into cutting-edge developments, this primer not only equips researchers and practitioners with a solid foundation in BNNs, but also illuminates the potential applications of this dynamic field. As a valuable resource, it fosters an understanding of BNNs and their promising prospects, facilitating further advancements in the pursuit of knowledge and innovation.
Differential 2D Copula Approximating Transforms via Sobolev Training: 2-Cats Networks
Figueiredo, Flavio, Fernandes, Josรฉ Geraldo, Silva, Jackson, Assunรงรฃo, Renato M.
Copulas are a powerful statistical tool that captures dependencies across data dimensions. When applying Copulas, we can estimate multivariate distribution functions by initially estimating independent marginals, an easy task, and then a single copulating function, $C$, to connect the marginals, a hard task. For two-dimensional data, a copula is a two-increasing function of the form $C: (u,v)\in \mathbf{I}^2 \rightarrow \mathbf{I}$, where $\mathbf{I} = [0, 1]$. In this paper, we show how Neural Networks (NNs) can approximate any two-dimensional copula non-parametrically. Our approach, denoted as 2-Cats, is inspired by the Physics-Informed Neural Networks and Sobolev Training literature. Not only do we show that we can estimate the output of a 2d Copula better than the state-of-the-art, our approach is non-parametric and respects the mathematical properties of a Copula $C$.
Active SLAM: A Review On Last Decade
Ahmed, Muhammad Farhan, Masood, Khayyam, Fremont, Vincent, Fantoni, Isabelle
This article presents a comprehensive review of the Active Simultaneous Localization and Mapping (A-SLAM) research conducted over the past decade. It explores the formulation, applications, and methodologies employed in A-SLAM, particularly in trajectory generation and control-action selection, drawing on concepts from Information Theory (IT) and the Theory of Optimal Experimental Design (TOED). This review includes both qualitative and quantitative analyses of various approaches, deployment scenarios, configurations, path-planning methods, and utility functions within A-SLAM research. Furthermore, this article introduces a novel analysis of Active Collaborative SLAM (AC-SLAM), focusing on collaborative aspects within SLAM systems. It includes a thorough examination of collaborative parameters and approaches, supported by both qualitative and statistical assessments. This study also identifies limitations in the existing literature and suggests potential avenues for future research. This survey serves as a valuable resource for researchers seeking insights into A-SLAM methods and techniques, offering a current overview of A-SLAM formulation.
Transfer Learning for Bayesian Optimization on Heterogeneous Search Spaces
Fan, Zhou, Han, Xinran, Wang, Zi
Bayesian optimization (BO) is a popular black-box function optimization method, which makes sequential decisions based on a Bayesian model, typically a Gaussian process (GP), of the function. To ensure the quality of the model, transfer learning approaches have been developed to automatically design GP priors by learning from observations on "training" functions. These training functions are typically required to have the same domain as the "test" function (black-box function to be optimized). In this paper, we introduce MPHD, a model pre-training method on heterogeneous domains, which uses a neural net mapping from domain-specific contexts to specifications of hierarchical GPs. MPHD can be seamlessly integrated with BO to transfer knowledge across heterogeneous search spaces. Our theoretical and empirical results demonstrate the validity of MPHD and its superior performance on challenging black-box function optimization tasks.
HyperBO+: Pre-training a universal prior for Bayesian optimization with hierarchical Gaussian processes
Fan, Zhou, Han, Xinran, Wang, Zi
Bayesian optimization (BO), while proved highly effective for many black-box function optimization tasks, requires practitioners to carefully select priors that well model their functions of interest. Rather than specifying by hand, researchers have investigated transfer learning based methods to automatically learn the priors, e.g. multi-task BO (Swersky et al., 2013), few-shot BO (Wistuba and Grabocka, 2021) and HyperBO (Wang et al., 2022). However, those prior learning methods typically assume that the input domains are the same for all tasks, weakening their ability to use observations on functions with different domains or generalize the learned priors to BO on different search spaces. In this work, we present HyperBO+: a pre-training approach for hierarchical Gaussian processes that enables the same prior to work universally for Bayesian optimization on functions with different domains. We propose a two-step pre-training method and analyze its appealing asymptotic properties and benefits to BO both theoretically and empirically. On real-world hyperparameter tuning tasks that involve multiple search spaces, we demonstrate that HyperBO+ is able to generalize to unseen search spaces and achieves lower regrets than competitive baselines.
Multi-unit soft sensing permits few-shot learning
Grimstad, Bjarne, Lรธvland, Kristian, Imsland, Lars S.
Recent literature has explored various ways to improve soft sensors using learning algorithms with transferability. Broadly put, the performance of a soft sensor may be strengthened when it is learned by solving multiple tasks. The usefulness of transferability depends on how strongly related the devised learning tasks are. A particularly relevant case for transferability, is when a soft sensor is to be developed for a process of which there are many realizations, e.g. system or device with many implementations from which data is available. Then, each realization presents a soft sensor learning task, and it is reasonable to expect that the different tasks are strongly related. Applying transferability in this setting leads to what we call multi-unit soft sensing, where a soft sensor models a process by learning from data from all of its realizations. This paper explores the learning abilities of a multi-unit soft sensor, which is formulated as a hierarchical model and implemented using a deep neural network. In particular, we investigate how well the soft sensor generalizes as the number of units increase. Using a large industrial dataset, we demonstrate that, when the soft sensor is learned from a sufficient number of tasks, it permits few-shot learning on data from new units. Surprisingly, regarding the difficulty of the task, few-shot learning on 1-3 data points often leads to a high performance on new units.
Bayesian Cram\'er-Rao Bound Estimation with Score-Based Models
Crafts, Evan Scope, Zhang, Xianyang, Zhao, Bo
The Bayesian Cram\'er-Rao bound (CRB) provides a lower bound on the error of any Bayesian estimator under mild regularity conditions. It can be used to benchmark the performance of estimators, and provides a principled design metric for guiding system design and optimization. However, the Bayesian CRB depends on the prior distribution, which is often unknown for many problems of interest. This work develops a new data-driven estimator for the Bayesian CRB using score matching, a statistical estimation technique, to model the prior distribution. The performance of the estimator is analyzed in both the classical parametric modeling regime and the neural network modeling regime. In both settings, we develop novel non-asymptotic bounds on the score matching error and our Bayesian CRB estimator. Our proofs build on results from empirical process theory, including classical bounds and recently introduced techniques for characterizing neural networks, to address the challenges of bounding the score matching error. The performance of the estimator is illustrated empirically on a denoising problem example with a Gaussian mixture prior.
FOSA: Full Information Maximum Likelihood (FIML) Optimized Self-Attention Imputation for Missing Data
Effectively addressing missing values in data imputation is pivotal, particularly for intricate datasets. This study delves into the full information maximum likelihood (FIML) optimized self-attention (FOSA) framework, an innovative approach that amalgamates the strengths of FIML estimation with the capabilities of self-attention neural networks. Our methodology begins with an initial estimation of missing values via FIML, which is subsequently refined by leveraging the self-attention mechanism. Our comprehensive experiments on both simulated and real-world datasets underscore the pronounced advantages of FOSA over traditional FIML techniques, including encapsulating facets of accuracy, computational efficiency, and adaptability to diverse data structures. Intriguingly, even in cases where the structural equation model can be misspecified, leading to sub-optimal FIML estimates, the robust architecture of the FOSA self-attention component adeptly rectifies and optimizes the imputation outcomes. Our empirical tests reveal that FOSA consistently delivers commendable predictions even for approximately 40% random missingness, highlighting its robustness and potential for wide-scale applications in data imputation.