Bayesian Inference
Dynamic Exploration-Exploitation Trade-Off in Active Learning Regression with Bayesian Hierarchical Modeling
Islam, Upala Junaida, Paynabar, Kamran, Runger, George, Iquebal, Ashif Sikandar
Active learning provides a framework to adaptively query the most informative experiments towards learning an unknown black-box function. Various approaches of active learning have been proposed in the literature, however, they either focus on exploration or exploitation in the design space. Methods that do consider exploration-exploitation simultaneously employ fixed or ad-hoc measures to control the trade-off that may not be optimal. In this paper, we develop a Bayesian hierarchical approach, referred as BHEEM, to dynamically balance the exploration-exploitation trade-off as more data points are queried. To sample from the posterior distribution of the trade-off parameter, We subsequently formulate an approximate Bayesian computation approach based on the linear dependence of queried data in the feature space. Simulated and real-world examples show the proposed approach achieves at least 21% and 11% average improvement when compared to pure exploration and exploitation strategies respectively. More importantly, we note that by optimally balancing the trade-off between exploration and exploitation, BHEEM performs better or at least as well as either pure exploration or pure exploitation.
On Sinkhorn's Algorithm and Choice Modeling
Qu, Zhaonan, Galichon, Alfred, Ugander, Johan
For a broad class of choice and ranking models based on Luce's choice axiom, including the Bradley--Terry--Luce and Plackett--Luce models, we show that the associated maximum likelihood estimation problems are equivalent to a classic matrix balancing problem with target row and column sums. This perspective opens doors between two seemingly unrelated research areas, and allows us to unify existing algorithms in the choice modeling literature as special instances or analogs of Sinkhorn's celebrated algorithm for matrix balancing. We draw inspirations from these connections and resolve important open problems on the study of Sinkhorn's algorithm. We first prove the global linear convergence of Sinkhorn's algorithm for non-negative matrices whenever finite solutions to the matrix balancing problem exist. We characterize this global rate of convergence in terms of the algebraic connectivity of the bipartite graph constructed from data. Next, we also derive the sharp asymptotic rate of linear convergence, which generalizes a classic result of Knight (2008), but with a more explicit analysis that exploits an intrinsic orthogonality structure. To our knowledge, these are the first quantitative linear convergence results for Sinkhorn's algorithm for general non-negative matrices and positive marginals. The connections we establish in this paper between matrix balancing and choice modeling could help motivate further transmission of ideas and interesting results in both directions.
SAVME: Efficient Safety Validation for Autonomous Systems Using Meta-Learning
Schlichting, Marc R., Boord, Nina V., Corso, Anthony L., Kochenderfer, Mykel J.
Discovering potential failures of an autonomous system is important prior to deployment. Falsification-based methods are often used to assess the safety of such systems, but the cost of running many accurate simulation can be high. The validation can be accelerated by identifying critical failure scenarios for the system under test and by reducing the simulation runtime. We propose a Bayesian approach that integrates meta-learning strategies with a multi-armed bandit framework. Our method involves learning distributions over scenario parameters that are prone to triggering failures in the system under test, as well as a distribution over fidelity settings that enable fast and accurate simulations. In the spirit of meta-learning, we also assess whether the learned fidelity settings distribution facilitates faster learning of the scenario parameter distributions for new scenarios. We showcase our methodology using a cutting-edge 3D driving simulator, incorporating 16 fidelity settings for an autonomous vehicle stack that includes camera and lidar sensors. We evaluate various scenarios based on an autonomous vehicle pre-crash typology. As a result, our approach achieves a significant speedup, up to 18 times faster compared to traditional methods that solely rely on a high-fidelity simulator.
An Efficient Algorithm for Clustered Multi-Task Compressive Sensing
This paper considers clustered multi-task compressive sensing, a hierarchical model that solves multiple compressive sensing tasks by finding clusters of tasks that leverage shared information to mutually improve signal reconstruction. The existing inference algorithm for this model is computationally expensive and does not scale well in high dimensions. The main bottleneck involves repeated matrix inversion and log-determinant computation for multiple large covariance matrices. We propose a new algorithm that substantially accelerates model inference by avoiding the need to explicitly compute these covariance matrices. Our approach combines Monte Carlo sampling with iterative linear solvers. Our experiments reveal that compared to the existing baseline, our algorithm can be up to thousands of times faster and an order of magnitude more memory-efficient.
Gradient and Uncertainty Enhanced Sequential Sampling for Global Fit
Lรคmmle, Sven, Bogoclu, Can, Cremanns, Kevin, Roos, Dirk
Most of these applications require computationally demanding simulation models. Machine learning (ML) models have emerged to mitigate the computational cost and are used as surrogates. For this task, ML models learn the relationship between the response of the expensive simulations from a dataset containing past observations. Various ML methods were proposed as surrogate models (sometimes referred as response surface methods), e.g. as solvers for partial differential equations (PDEs) describing the behavior of high-dimensional vector spaces or fields [1-3], or probabilistic approximators of parametric response functions [4]. Among these types of surrogate models, Gaussian Process Regression (GP) [5] is a popular choice [6, 7] because of its flexibility and ability to predict the model uncertainty. However, drawbacks are the selection of a suitable covariance (kernel) function that is application dependent, and the computational burden for larger data sets. Various extensions to GPs such as the deep GPs[8], sparse GPs based on variational inference [9, 10], and efficient matrix decomposition [11] were developed to overcome some of these limitations. One particularly promising approach is the combination of GPs with Artificial Neural Networks (ANNs) to Deep Gaussian Covariance Networks (DGCNs) [12, 13] to learn the non-stationary hyperparameters of the GP together with combinations of different covariance functions.
Human-like Few-Shot Learning via Bayesian Reasoning over Natural Language
A core tension in models of concept learning is that the model must carefully balance the tractability of inference against the expressivity of the hypothesis class. Humans, however, can efficiently learn a broad range of concepts. We introduce a model of inductive learning that seeks to be human-like in that sense. It implements a Bayesian reasoning process where a language model first proposes candidate hypotheses expressed in natural language, which are then re-weighed by a prior and a likelihood. By estimating the prior from human data, we can predict human judgments on learning problems involving numbers and sets, spanning concepts that are generative, discriminative, propositional, and higher-order.
Statistical physics, Bayesian inference and neural information processing
Grant, Erin, Nestler, Sandra, ลimลek, Berfin, Solla, Sara
Lecture notes from the course given by Professor Sara A. Solla at the Les Houches summer school on "Statistical physics of Machine Learning". The notes discuss neural information processing through the lens of Statistical Physics. Contents include Bayesian inference and its connection to a Gibbs description of learning and generalization, Generalized Linear Models as a controlled alternative to backpropagation through time, and linear and non-linear techniques for dimensionality reduction.
Understanding Augmentation-based Self-Supervised Representation Learning via RKHS Approximation and Regression
Zhai, Runtian, Liu, Bingbin, Risteski, Andrej, Kolter, Zico, Ravikumar, Pradeep
Data augmentation is critical to the empirical success of modern self-supervised representation learning, such as contrastive learning and masked language modeling. However, a theoretical understanding of the exact role of augmentation remains limited. Recent work has built the connection between self-supervised learning and the approximation of the top eigenspace of a graph Laplacian operator, suggesting that learning a linear probe atop such representation can be connected to RKHS regression. Building on this insight, this work delves into a statistical analysis of augmentation-based pretraining. Starting from the isometry property, a geometric characterization of the target function given by the augmentation, we disentangle the effects of the model and the augmentation, and prove two generalization bounds that are free of model complexity. Our first bound works for an arbitrary encoder, where the prediction error is decomposed as the sum of an estimation error incurred by fitting a linear probe with RKHS regression, and an approximation error entailed by RKHS approximation. Our second bound specifically addresses the case where the encoder is near-optimal, that is it approximates the top-d eigenspace of the RKHS induced by the augmentation. A key ingredient in our analysis is the augmentation complexity, which we use to quantitatively compare different augmentations and analyze their impact on downstream performance. It is widely acknowledged that better data augmentation techniques have been a major driving force in many recent breakthroughs in self-supervised representation learning. For example, contrastive learning (Chen et al., 2020) with aggressive random cropping and strong color distortion greatly improves the performance of vision tasks, and masked prediction with random masking and replacement is among the state-of-the-art representation learning methods for both natural language processing (Devlin et al., 2019) and vision (He et al., 2022). Due to their extraordinary empirical successes, developing a rigorous theoretical understanding of augmentation-based representation learning is an important open problem, and is crucial for inventing new, better and principled augmentation techniques. A key advance was recently made by HaoChen et al. (2021), who analyzed a variant of contrastive representation learning, termed spectral contrastive learning, by connecting it to learning eigenfunctions of a Laplacian operator over a population augmentation graph. This has been followed up by multiple works which extended these results to more general contrastive learning approaches (Saunshi et al., 2022; Johnson et al., 2023; Cabannes et al., 2023).
Provable Offline Preference-Based Reinforcement Learning
Zhan, Wenhao, Uehara, Masatoshi, Kallus, Nathan, Lee, Jason D., Sun, Wen
In this paper, we investigate the problem of offline Preference-based Reinforcement Learning (PbRL) with human feedback where feedback is available in the form of preference between trajectory pairs rather than explicit rewards. Our proposed algorithm consists of two main steps: (1) estimate the implicit reward using Maximum Likelihood Estimation (MLE) with general function approximation from offline data and (2) solve a distributionally robust planning problem over a confidence set around the MLE. We consider the general reward setting where the reward can be defined over the whole trajectory and provide a novel guarantee that allows us to learn any target policy with a polynomial number of samples, as long as the target policy is covered by the offline data. This guarantee is the first of its kind with general function approximation. To measure the coverage of the target policy, we introduce a new single-policy concentrability coefficient, which can be upper bounded by the per-trajectory concentrability coefficient. We also establish lower bounds that highlight the necessity of such concentrability and the difference from standard RL, where state-action-wise rewards are directly observed. We further extend and analyze our algorithm when the feedback is given over action pairs.
A Variational Perspective on Solving Inverse Problems with Diffusion Models
Mardani, Morteza, Song, Jiaming, Kautz, Jan, Vahdat, Arash
Diffusion models have emerged as a key pillar of foundation models in visual domains. One of their critical applications is to universally solve different downstream inverse tasks via a single diffusion prior without re-training for each task. Most inverse tasks can be formulated as inferring a posterior distribution over data (e.g., a full image) given a measurement (e.g., a masked image). This is however challenging in diffusion models since the nonlinear and iterative nature of the diffusion process renders the posterior intractable. To cope with this challenge, we propose a variational approach that by design seeks to approximate the true posterior distribution. We show that our approach naturally leads to regularization by denoising diffusion process (RED-Diff) where denoisers at different timesteps concurrently impose different structural constraints over the image. To gauge the contribution of denoisers from different timesteps, we propose a weighting mechanism based on signal-to-noise-ratio (SNR). Our approach provides a new variational perspective for solving inverse problems with diffusion models, allowing us to formulate sampling as stochastic optimization, where one can simply apply off-the-shelf solvers with lightweight iterates. Our experiments for image restoration tasks such as inpainting and superresolution demonstrate the strengths of our method compared with state-of-the-art sampling-based diffusion models.