Bayesian Inference
A Proximal Algorithm for Sampling
We study sampling problems associated with potentials that lack smoothness. The potentials can be either convex or non-convex. Departing from the standard smooth setting, the potentials are only assumed to be weakly smooth or non-smooth, or the summation of multiple such functions. We develop a sampling algorithm that resembles proximal algorithms in optimization for this challenging sampling task. Our algorithm is based on a special case of Gibbs sampling known as the alternating sampling framework (ASF). The key contribution of this work is a practical realization of the ASF based on rejection sampling for both non-convex and convex potentials that are not necessarily smooth. In almost all the cases of sampling considered in this work, our proximal sampling algorithm achieves better complexity than all existing methods.
Learning thermodynamically constrained equations of state with uncertainty
Sharma, Himanshu, Gaffney, Jim A., Tsapetis, Dimitrios, Shields, Michael D.
Numerical simulations of high energy-density experiments require equation of state (EOS) models that relate a material's thermodynamic state variables -- specifically pressure, volume/density, energy, and temperature. EOS models are typically constructed using a semi-empirical parametric methodology, which assumes a physics-informed functional form with many tunable parameters calibrated using experimental/simulation data. Since there are inherent uncertainties in the calibration data (parametric uncertainty) and the assumed functional EOS form (model uncertainty), it is essential to perform uncertainty quantification (UQ) to improve confidence in the EOS predictions. Model uncertainty is challenging for UQ studies since it requires exploring the space of all possible physically consistent functional forms. Thus, it is often neglected in favor of parametric uncertainty, which is easier to quantify without violating thermodynamic laws. This work presents a data-driven machine learning approach to constructing EOS models that naturally captures model uncertainty while satisfying the necessary thermodynamic consistency and stability constraints. We propose a novel framework based on physics-informed Gaussian process regression (GPR) that automatically captures total uncertainty in the EOS and can be jointly trained on both simulation and experimental data sources. A GPR model for the shock Hugoniot is derived and its uncertainties are quantified using the proposed framework. We apply the proposed model to learn the EOS for the diamond solid state of carbon, using both density functional theory data and experimental shock Hugoniot data to train the model and show that the prediction uncertainty reduces by considering the thermodynamic constraints.
Efficient and Multiply Robust Risk Estimation under General Forms of Dataset Shift
Qiu, Hongxiang, Tchetgen, Eric Tchetgen, Dobriban, Edgar
Statistical machine learning methods often face the challenge of limited data available from the population of interest. One remedy is to leverage data from auxiliary source populations, which share some conditional distributions or are linked in other ways with the target domain. Techniques leveraging such \emph{dataset shift} conditions are known as \emph{domain adaptation} or \emph{transfer learning}. Despite extensive literature on dataset shift, limited works address how to efficiently use the auxiliary populations to improve the accuracy of risk evaluation for a given machine learning task in the target population. In this paper, we study the general problem of efficiently estimating target population risk under various dataset shift conditions, leveraging semiparametric efficiency theory. We consider a general class of dataset shift conditions, which includes three popular conditions -- covariate, label and concept shift -- as special cases. We allow for partially non-overlapping support between the source and target populations. We develop efficient and multiply robust estimators along with a straightforward specification test of these dataset shift conditions. We also derive efficiency bounds for two other dataset shift conditions, posterior drift and location-scale shift. Simulation studies support the efficiency gains due to leveraging plausible dataset shift conditions.
Uncertainty Informed Optimal Resource Allocation with Gaussian Process based Bayesian Inference
We focus on the problem of uncertainty informed allocation of medical resources (vaccines) to heterogeneous populations for managing epidemic spread. We tackle two related questions: (1) For a compartmental ordinary differential equation (ODE) model of epidemic spread, how can we estimate and integrate parameter uncertainty into resource allocation decisions? (2) How can we computationally handle both nonlinear ODE constraints and parameter uncertainties for a generic stochastic optimization problem for resource allocation? To the best of our knowledge current literature does not fully resolve these questions. Here, we develop a data-driven approach to represent parameter uncertainty accurately and tractably in a novel stochastic optimization problem formulation. We first generate a tractable scenario set by estimating the distribution on ODE model parameters using Bayesian inference with Gaussian processes. Next, we develop a parallelized solution algorithm that accounts for scenario-dependent nonlinear ODE constraints. Our scenario-set generation procedure and solution approach are flexible in that they can handle any compartmental epidemiological ODE model. Our computational experiments on two different non-linear ODE models (SEIR and SEPIHR) indicate that accounting for uncertainty in key epidemiological parameters can improve the efficacy of time-critical allocation decisions by 4-8%. This improvement can be attributed to data-driven and optimal (strategic) nature of vaccine allocations, especially in the early stages of the epidemic when the allocation strategy can crucially impact the long-term trajectory of the disease.
From Query Tools to Causal Architects: Harnessing Large Language Models for Advanced Causal Discovery from Data
Ban, Taiyu, Chen, Lyvzhou, Wang, Xiangyu, Chen, Huanhuan
Large Language Models (LLMs) exhibit exceptional abilities for causal analysis between concepts in numerous societally impactful domains, including medicine, science, and law. Recent research on LLM performance in various causal discovery and inference tasks has given rise to a new ladder in the classical three-stage framework of causality. In this paper, we advance the current research of LLM-driven causal discovery by proposing a novel framework that combines knowledge-based LLM causal analysis with data-driven causal structure learning. To make LLM more than a query tool and to leverage its power in discovering natural and new laws of causality, we integrate the valuable LLM expertise on existing causal mechanisms into statistical analysis of objective data to build a novel and practical baseline for causal structure learning. We introduce a universal set of prompts designed to extract causal graphs from given variables and assess the influence of LLM prior causality on recovering causal structures from data. We demonstrate the significant enhancement of LLM expertise on the quality of recovered causal structures from data, while also identifying critical challenges and issues, along with potential approaches to address them. As a pioneering study, this paper aims to emphasize the new frontier that LLMs are opening for classical causal discovery and inference, and to encourage the widespread adoption of LLM capabilities in data-driven causal analysis.
Understanding Pathologies of Deep Heteroskedastic Regression
Wong-Toi, Eliot, Boyd, Alex, Fortuin, Vincent, Mandt, Stephan
Several recent studies have reported negative results when using heteroskedastic neural regression models to model real-world data. In particular, for overparameterized models, the mean and variance networks are powerful enough to either fit every single data point (while shrinking the predicted variances to zero), or to learn a constant prediction with an output variance exactly matching every predicted residual (i.e., explaining the targets as pure noise). This paper studies these difficulties from the perspective of statistical physics. We show that the observed instabilities are not specific to any neural network architecture but are already present in a field theory of an overparameterized conditional Gaussian likelihood model. Under light assumptions, we derive a nonparametric free energy that can be solved numerically. The resulting solutions show excellent qualitative agreement with empirical model fits on real-world data and, in particular, prove the existence of phase transitions, i.e., abrupt, qualitative differences in the behaviors of the regressors upon varying the regularization strengths on the two networks. Our work thus provides a theoretical explanation for the necessity to carefully regularize heteroskedastic regression models. Moreover, the insights from our theory suggest a scheme for optimizing this regularization which is quadratically more efficient than the naive approach.
Spatiotemporal Besov Priors for Bayesian Inverse Problems
Lan, Shiwei, Pasha, Mirjeta, Li, Shuyi
Fast development in science and technology has driven the need for proper statistical tools to capture special data features such as abrupt changes or sharp contrast. Many applications in the data science seek spatiotemporal reconstruction from a sequence of time-dependent objects with discontinuity or singularity, e.g. dynamic computerized tomography (CT) images with edges. Traditional methods based on Gaussian processes (GP) may not provide satisfactory solutions since they tend to offer over-smooth prior candidates. Recently, Besov process (BP) defined by wavelet expansions with random coefficients has been proposed as a more appropriate prior for this type of Bayesian inverse problems. While BP outperforms GP in imaging analysis to produce edge-preserving reconstructions, it does not automatically incorporate temporal correlation inherited in the dynamically changing images. In this paper, we generalize BP to the spatiotemporal domain (STBP) by replacing the random coefficients in the series expansion with stochastic time functions following Q-exponential process which governs the temporal correlation strength. Mathematical and statistical properties about STBP are carefully studied. A white-noise representation of STBP is also proposed to facilitate the point estimation through maximum a posterior (MAP) and the uncertainty quantification (UQ) by posterior sampling. Two limited-angle CT reconstruction examples and a highly non-linear inverse problem involving Navier-Stokes equation are used to demonstrate the advantage of the proposed STBP in preserving spatial features while accounting for temporal changes compared with the classic STGP and a time-uncorrelated approach.
Representation Learning via Variational Bayesian Networks
Barkan, Oren, Caciularu, Avi, Rejwan, Idan, Katz, Ori, Weill, Jonathan, Malkiel, Itzik, Koenigstein, Noam
In the recommender system community, this situation is known as the "cold-start" problem [7, 9], where rare ('cold') entities (e.g., We present Variational Bayesian Network (VBN) - a novel Bayesian unpopular items or new items that are introduced to the catalog) are entity representation learning model that utilizes hierarchical and often poorly represented due to insufficient statistics. In the natural relational side information and is particularly useful for modeling language processing community, where the focus is on learning entities in the "long-tail", where the data is scarce. VBN provides representations for words and phrases, a common mitigation is to better modeling for long-tail entities via two complementary mechanisms: increase the training set size by utilizing increasingly larger corpus First, VBN employs informative hierarchical priors that e.g., BERT [20, 39]. However, it was shown that even when enable information propagation between entities sharing common increasing the amount of co-occurrence data, the existence of rare, ancestors. Additionally, VBN models explicit relations between entities out-of-vocabulary entities persists [26, 50, 52, 53].
Sparse Representations, Inference and Learning
Lauditi, Clarissa, Troiani, Emanuele, Mézard, Marc
In recent years statistical physics has proven to be a valuable tool to probe into large dimensional inference problems such as the ones occurring in machine learning. Statistical physics provides analytical tools to study fundamental limitations in their solutions and proposes algorithms to solve individual instances. In these notes, based on the lectures by Marc M\'ezard in 2022 at the summer school in Les Houches, we will present a general framework that can be used in a large variety of problems with weak long-range interactions, including the compressed sensing problem, or the problem of learning in a perceptron. We shall see how these problems can be studied at the replica symmetric level, using developments of the cavity methods, both as a theoretical tool and as an algorithm.
Differentiable User Models
Hämäläinen, Alex, Çelikok, Mustafa Mert, Kaski, Samuel
Probabilistic user modeling is essential for building machine learning systems in the ubiquitous cases with humans in the loop. However, modern advanced user models, often designed as cognitive behavior simulators, are incompatible with modern machine learning pipelines and computationally prohibitive for most practical applications. We address this problem by introducing widely-applicable differentiable surrogates for bypassing this computational bottleneck; the surrogates enable computationally efficient inference with modern cognitive models. We show experimentally that modeling capabilities comparable to the only available solution, existing likelihood-free inference methods, are achievable with a computational cost suitable for online applications. Finally, we demonstrate how AI-assistants can now use cognitive models for online interaction in a menu-search task, which has so far required hours of computation during interaction.