Bayesian Inference
Randomized Physics-Informed Machine Learning for Uncertainty Quantification in High-Dimensional Inverse Problems
Zong, Yifei, Barajas-Solano, David, Tartakovsky, Alexandre M.
We propose a physics-informed machine learning method for uncertainty quantification in high-dimensional inverse problems. In this method, the states and parameters of partial differential equations (PDEs) are approximated with truncated conditional Karhunen-Lo\`eve expansions (CKLEs), which, by construction, match the measurements of the respective variables. The maximum a posteriori (MAP) solution of the inverse problem is formulated as a minimization problem over CKLE coefficients where the loss function is the sum of the norm of PDE residuals and the $\ell_2$ regularization term. This MAP formulation is known as the physics-informed CKLE (PICKLE) method. Uncertainty in the inverse solution is quantified in terms of the posterior distribution of CKLE coefficients, and we sample the posterior by solving a randomized PICKLE minimization problem, formulated by adding zero-mean Gaussian perturbations in the PICKLE loss function. We call the proposed approach the randomized PICKLE (rPICKLE) method. For linear and low-dimensional nonlinear problems (15 CKLE parameters), we show analytically and through comparison with Hamiltonian Monte Carlo (HMC) that the rPICKLE posterior converges to the true posterior given by the Bayes rule. For high-dimensional non-linear problems with 2000 CKLE parameters, we numerically demonstrate that rPICKLE posteriors are highly informative--they provide mean estimates with an accuracy comparable to the estimates given by the MAP solution and the confidence interval that mostly covers the reference solution. We are not able to obtain the HMC posterior to validate rPICKLE's convergence to the true posterior due to the HMC's prohibitive computational cost for the considered high-dimensional problems. Our results demonstrate the advantages of rPICKLE over HMC for approximately sampling high-dimensional posterior distributions subject to physics constraints.
Greedy Grammar Induction with Indirect Negative Evidence
This paper offers a fresh look at the pumping lemma constant as an upper bound for the finite structural information of a Context Free Grammar. An objective function based on indirect negative evidence considers the occurrences, and non-occurrences, of a finite number of trees, encountered after a sufficiently long non-adversial input presentation. This objective function has optimal substructure in the hypotheses space, giving rise to a greedy search learner. With this learner, a range of classes of Context Free Languages is shown to be learnable (identifiable in the limit) on an otherwise intractable hypotheses space.
An Information-Theoretic Analysis of Nonstationary Bandit Learning
In nonstationary bandit learning problems, the decision-maker must continually gather information and adapt their action selection as the latent state of the environment evolves. In each time period, some latent optimal action maximizes expected reward under the environment state. We view the optimal action sequence as a stochastic process, and take an information-theoretic approach to analyze attainable performance. We bound limiting per-period regret in terms of the entropy rate of the optimal action process. The bound applies to a wide array of problems studied in the literature and reflects the problem's information structure through its information-ratio.
Statistical inverse learning problems with random observations
Abhishake, null, Helin, Tapio, Mücke, Nicole
We provide an overview of recent progress in statistical inverse problems with random experimental design, covering both linear and nonlinear inverse problems. Different regularization schemes have been studied to produce robust and stable solutions. We discuss recent results in spectral regularization methods and regularization by projection, exploring both approaches within the context of Hilbert scales and presenting new insights particularly in regularization by projection. Additionally, we overview recent advancements in regularization using convex penalties. Convergence rates are analyzed in terms of the sample size in a probabilistic sense, yielding minimax rates in both expectation and probability. To achieve these results, the structure of reproducing kernel Hilbert spaces is leveraged to establish minimax rates in the statistical learning setting. We detail the assumptions underpinning these key elements of our proofs. Finally, we demonstrate the application of these concepts to nonlinear inverse problems in pharmacokinetic/pharmacodynamic (PK/PD) models, where the task is to predict changes in drug concentrations in patients.
Evaluating District-based Election Surveys with Synthetic Dirichlet Likelihood
In district-based multi-party elections, electors cast votes in their respective districts. In each district, the party with maximum votes wins the corresponding seat in the governing body. Election Surveys try to predict the election outcome (vote shares and seat shares of parties) by querying a random sample of electors. However, the survey results are often inconsistent with the actual results, which could be due to multiple reasons. The aim of this work is to estimate a posterior distribution over the possible outcomes of the election, given one or more survey results. This is achieved using a prior distribution over vote shares, election models to simulate the complete election from the vote share, and survey models to simulate survey results from a complete election. The desired posterior distribution over the space of possible outcomes is constructed using Synthetic Dirichlet Likelihoods, whose parameters are estimated from Monte Carlo sampling of elections using the election models. We further show the same approach can also use be used to evaluate the surveys - whether they were biased or not, based on the true outcome once it is known. Our work offers the first-ever probabilistic model to analyze district-based election surveys. We illustrate our approach with extensive experiments on real and simulated data of district-based political elections in India.
Make Me a BNN: A Simple Strategy for Estimating Bayesian Uncertainty from Pre-trained Models
Franchi, Gianni, Laurent, Olivier, Leguéry, Maxence, Bursuc, Andrei, Pilzer, Andrea, Yao, Angela
Deep Neural Networks (DNNs) are powerful tools for various computer vision tasks, yet they often struggle with reliable uncertainty quantification - a critical requirement for real-world applications. Bayesian Neural Networks (BNN) are equipped for uncertainty estimation but cannot scale to large DNNs that are highly unstable to train. To address this challenge, we introduce the Adaptable Bayesian Neural Network (ABNN), a simple and scalable strategy to seamlessly transform DNNs into BNNs in a post-hoc manner with minimal computational and training overheads. ABNN preserves the main predictive properties of DNNs while enhancing their uncertainty quantification abilities through simple BNN adaptation layers (attached to normalization layers) and a few fine-tuning steps on pre-trained models. We conduct extensive experiments across multiple datasets for image classification and semantic segmentation tasks, and our results demonstrate that ABNN achieves state-of-the-art performance without the computational budget typically associated with ensemble methods.
Information-seeking polynomial NARX model-predictive control through expected free energy minimization
We propose an adaptive model-predictive controller that balances driving the system to a goal state and seeking system observations that are informative with respect to the parameters of a nonlinear autoregressive exogenous model. The controller's objective function is derived from an expected free energy functional and contains information-theoretic terms expressing uncertainty over model parameters and output predictions. Experiments illustrate how parameter uncertainty affects the control objective and evaluate the proposed controller for a pendulum swing-up task.
DynGFN: Towards Bayesian Inference of Gene Regulatory Networks with GFlowNets
Atanackovic, Lazar, Tong, Alexander, Wang, Bo, Lee, Leo J., Bengio, Yoshua, Hartford, Jason
One of the grand challenges of cell biology is inferring the gene regulatory network (GRN) which describes interactions between genes and their products that control gene expression and cellular function. We can treat this as a causal discovery problem but with two non-standard challenges: (1) regulatory networks are inherently cyclic so we should not model a GRN as a directed acyclic graph (DAG), and (2) observations have significant measurement noise, so for typical sample sizes there will always be a large equivalence class of graphs that are likely given the data, and we want methods that capture this uncertainty. Existing methods either focus on challenge (1), identifying cyclic structure from dynamics, or on challenge (2) learning complex Bayesian posteriors over DAGs, but not both. In this paper we leverage the fact that it is possible to estimate the "velocity" of gene expression with RNA velocity techniques to develop an approach that addresses both challenges. Because we have access to velocity information, we can treat the Bayesian structure learning problem as a problem of sparse identification of a dynamical system, capturing cyclic feedback loops through time. Since our objective is to model uncertainty over discrete structures, we leverage Generative Flow Networks (GFlowNets) to estimate the posterior distribution over the combinatorial space of possible sparse dependencies. Our results indicate that our method learns posteriors that better encapsulate the distributions of cyclic structures compared to counterpart state-of-the-art Bayesian structure learning approaches.
Learning Rich Rankings
Seshadri, Arjun, Ragain, Stephen, Ugander, Johan
Although the foundations of ranking are well established, the ranking literature has primarily been focused on simple, unimodal models, e.g. the Mallows and Plackett-Luce models, that define distributions centered around a single total ordering. Explicit mixture models have provided some tools for modelling multimodal ranking data, though learning such models from data is often difficult. In this work, we contribute a contextual repeated selection (CRS) model that leverages recent advances in choice modeling to bring a natural multimodality and richness to the rankings space. We provide rigorous theoretical guarantees for maximum likelihood estimation under the model through structure-dependent tail risk and expected risk bounds. As a by-product, we also furnish the first tight bounds on the expected risk of maximum likelihood estimators for the multinomial logit (MNL) choice model and the Plackett-Luce (PL) ranking model, as well as the first tail risk bound on the PL ranking model. The CRS model significantly outperforms existing methods for modeling real world ranking data in a variety of settings, from racing to rank choice voting.
SAVAE: Leveraging the variational Bayes autoencoder for survival analysis
Apellániz, Patricia A., Parras, Juan, Zazo, Santiago
In recent years, there has been a significant transformation in medical research methodologies towards the adoption of Deep Learning (DL) techniques for predicting critical events, such as disease development and patient mortality. Despite their potential to handle complex data, practical applications in this domain remain limited, with most studies still relying on traditional statistical methods. Survival Analysis (SA), or time-to-event analysis, is an essential tool for studying specific events in various disciplines, not only in medicine but also in fields such as recommendation systems [1], employee retention [2], market modeling [3], and financial risk assessment [4]. According to the existing literature, the Cox proportional hazards model (Cox-PH) [5] is the dominant SA method that offers a semiparametric regression solution to the non-parametric Kaplan-Meier estimator problem [6]. Unlike the Kaplan-Meier method, which uses a single covariate, Cox-PH incorporates multiple covariates to predict event times and assess their impact on the hazard rate at specific time points. However, it is crucial to acknowledge that the Cox-PH model is built on certain strong assumptions. One of these is the proportional hazards assumption, which posits that different individuals have hazard functions that remain constant over time. Furthermore, the model assumes a linear relation between the natural logarithm of the relative hazard (the ratio of the hazard at time t to the baseline hazard) and the covariates. Furthermore, it assumes the absence of interactions among these covariates.