Bayesian Inference
A new local time-decoupled squared Wasserstein-2 method for training stochastic neural networks to reconstruct uncertain parameters in dynamical systems
Xia, Mingtao, Shen, Qijing, Maini, Philip, Gaffney, Eamonn, Mogilner, Alex
Preprint submitted to Elsevier March 10, 2025 algorithms to solve such inverse-type problems advance different fields including inferring neural circuit dynamics from spiking data [42] in neuroscience, modeling and predicting complex weather patterns from historical data [9] in climate science, uncovering disease transmission dynamics from infection case counts over time [46] in epidemiology, and deducing reaction rates from experimental concentration-time profiles in reaction kinetics in biochemistry [30]. However, such inverse-type problems pose substantial mathematical and computational challenges, particularly when data are limited and noisy, motivating ongoing research into novel algorithms and theoretical frameworks to improve models' reconstruction accuracy and efficiency. In this paper, we study the inverse problem of inferring the distribution of model parameters for several dynamical systems including ordinary differential equations (ODEs), partial differential equations (PDEs), and stochastic differential equations (SDEs) from time-series data or spatiotemporal data. Existing methods for such problems can be broadly categorized into traditional statistical approaches and modern data-driven techniques. Traditional statistical methods often involve parameter estimation frameworks. For example, linear and nonlinear regression methods play a role in simpler systems where the functional form of the model is partially known [13]. Furthermore, maximum likelihood estimation and Bayesian inference methods [16, 33] are often adopted. Maximum likelihood estimation optimizes the likelihood of model parameter values in a proposed model from observed data, while Bayesian methods incorporate prior information and compute posterior distributions. These approaches are widely used in applications such as reaction network reconstruction and epidemiological modeling.
Enough Coin Flips Can Make LLMs Act Bayesian
Gupta, Ritwik, Corona, Rodolfo, Ge, Jiaxin, Wang, Eric, Klein, Dan, Darrell, Trevor, Chan, David M.
Large language models (LLMs) exhibit the ability to generalize given few-shot examples in their input prompt, an emergent capability known as in-context learning (ICL). We investigate whether LLMs utilize ICL to perform structured reasoning in ways that are consistent with a Bayesian framework or rely on pattern matching. Using a controlled setting of biased coin flips, we find that: (1) LLMs often possess biased priors, causing initial divergence in zero-shot settings, (2) in-context evidence outweighs explicit bias instructions, (3) LLMs broadly follow Bayesian posterior updates, with deviations primarily due to miscalibrated priors rather than flawed updates, and (4) attention magnitude has negligible effect on Bayesian inference. With sufficient demonstrations of biased coin flips via ICL, LLMs update their priors in a Bayesian manner.
Compositional World Knowledge leads to High Utility Synthetic data
Gaudi, Sachit, Sreekumar, Gautam, Boddeti, Vishnu
Machine learning systems struggle with robustness, under subpopulation shifts. This problem becomes especially pronounced in scenarios where only a subset of attribute combinations is observed during training -a severe form of subpopulation shift, referred as compositional shift. To address this problem, we ask the following question: Can we improve the robustness by training on synthetic data, spanning all possible attribute combinations? We first show that training of conditional diffusion models on limited data lead to incorrect underlying distribution. Therefore, synthetic data sampled from such models will result in unfaithful samples and does not lead to improve performance of downstream machine learning systems. To address this problem, we propose CoInD to reflect the compositional nature of the world by enforcing conditional independence through minimizing Fisher's divergence between joint and marginal distributions. We demonstrate that synthetic data generated by CoInD is faithful and this translates to state-of-the-art worst-group accuracy on compositional shift tasks on CelebA.
Propagating Model Uncertainty through Filtering-based Probabilistic Numerical ODE Solvers
Yao, Dingling, Tronarp, Filip, Bosch, Nathanael
Filtering-based probabilistic numerical solvers for ordinary differential equations (ODEs), also known as ODE filters, have been established as efficient methods for quantifying numerical uncertainty in the solution of ODEs. In practical applications, however, the underlying dynamical system often contains uncertain parameters, requiring the propagation of this model uncertainty to the ODE solution. In this paper, we demonstrate that ODE filters, despite their probabilistic nature, do not automatically solve this uncertainty propagation problem. To address this limitation, we present a novel approach that combines ODE filters with numerical quadrature to properly marginalize over uncertain parameters, while accounting for both parameter uncertainty and numerical solver uncertainty. Experiments across multiple dynamical systems demonstrate that the resulting uncertainty estimates closely match reference solutions. Notably, we show how the numerical uncertainty from the ODE solver can help prevent overconfidence in the propagated uncertainty estimates, especially when using larger step sizes. Our results illustrate that probabilistic numerical methods can effectively quantify both numerical and parametric uncertainty in dynamical systems.
Leveraging priors on distribution functions for multi-arm bandits
Vashishtha, Sumit, Maillard, Odalric-Ambrym
We introduce Dirichlet Process Posterior Sampling (DPPS), a Bayesian non-parametric algorithm for multi-arm bandits based on Dirichlet Process (DP) priors. Like Thompson-sampling, DPPS is a probability-matching algorithm, i.e., it plays an arm based on its posterior-probability of being optimal. Instead of assuming a parametric class for the reward generating distribution of each arm, and then putting a prior on the parameters, in DPPS the reward generating distribution is directly modeled using DP priors. DPPS provides a principled approach to incorporate prior belief about the bandit environment, and in the noninformative limit of the DP posteriors (i.e. Bayesian Bootstrap), we recover Non Parametric Thompson Sampling (NPTS), a popular non-parametric bandit algorithm, as a special case of DPPS. We employ stick-breaking representation of the DP priors, and show excellent empirical performance of DPPS in challenging synthetic and real world bandit environments. Finally, using an information-theoretic analysis, we show non-asymptotic optimality of DPPS in the Bayesian regret setup.
Poisoning Bayesian Inference via Data Deletion and Replication
Carreau, Matthieu, Naveiro, Roi, Caballero, William N.
Research in adversarial machine learning (AML) has shown that statistical models are vulnerable to maliciously altered data. However, despite advances in Bayesian machine learning models, most AML research remains concentrated on classical techniques. Therefore, we focus on extending the white-box model poisoning paradigm to attack generic Bayesian inference, highlighting its vulnerability in adversarial contexts. A suite of attacks are developed that allow an attacker to steer the Bayesian posterior toward a target distribution through the strategic deletion and replication of true observations, even when only sampling access to the posterior is available. Analytic properties of these algorithms are proven and their performance is empirically examined in both synthetic and real-world scenarios. With relatively little effort, the attacker is able to substantively alter the Bayesian's beliefs and, by accepting more risk, they can mold these beliefs to their will. By carefully constructing the adversarial posterior, surgical poisoning is achieved such that only targeted inferences are corrupted and others are minimally disturbed.
Flow-based Bayesian filtering for high-dimensional nonlinear stochastic dynamical systems
Wang, Xintong, Guan, Xiaofei, Guo, Ling, Wu, Hao
Bayesian filtering for high-dimensional nonlinear stochastic dynamical systems is a fundamental yet challenging problem in many fields of science and engineering. Existing methods face significant obstacles: Gaussian-based filters struggle with non-Gaussian distributions, while sequential Monte Carlo methods are computationally intensive and prone to particle degeneracy in high dimensions. Although generative models in machine learning have made significant progress in modeling high-dimensional non-Gaussian distributions, their inefficiency in online updating limits their applicability to filtering problems. To address these challenges, we propose a flow-based Bayesian filter (FBF) that integrates normalizing flows to construct a novel latent linear state-space model with Gaussian filtering distributions. This framework facilitates efficient density estimation and sampling using invertible transformations provided by normalizing flows, and it enables the construction of filters in a data-driven manner, without requiring prior knowledge of system dynamics or observation models. Numerical experiments demonstrate the superior accuracy and efficiency of FBF.
Convergence Rates for Softmax Gating Mixture of Experts
Nguyen, Huy, Ho, Nhat, Rinaldo, Alessandro
Mixture of experts (MoE) has recently emerged as an effective framework to advance the efficiency and scalability of machine learning models by softly dividing complex tasks among multiple specialized sub-models termed experts. Central to the success of MoE is an adaptive softmax gating mechanism which takes responsibility for determining the relevance of each expert to a given input and then dynamically assigning experts their respective weights. Despite its widespread use in practice, a comprehensive study on the effects of the softmax gating on the MoE has been lacking in the literature. To bridge this gap in this paper, we perform a convergence analysis of parameter estimation and expert estimation under the MoE equipped with the standard softmax gating or its variants, including a dense-to-sparse gating and a hierarchical softmax gating, respectively. Furthermore, our theories also provide useful insights into the design of sample-efficient expert structures. In particular, we demonstrate that it requires polynomially many data points to estimate experts satisfying our proposed \emph{strong identifiability} condition, namely a commonly used two-layer feed-forward network. In stark contrast, estimating linear experts, which violate the strong identifiability condition, necessitates exponentially many data points as a result of intrinsic parameter interactions expressed in the language of partial differential equations. All the theoretical results are substantiated with a rigorous guarantee.
LAPD: Langevin-Assisted Bayesian Active Learning for Physical Discovery
Kong, Cindy Xiangrui, Zheng, Haoyang, Lin, Guang
Discovering physical laws from data is a fundamental challenge in scientific research, particularly when high-quality data are scarce or costly to obtain. Traditional methods for identifying dynamical systems often struggle with noise sensitivity, inefficiency in data usage, and the inability to quantify uncertainty effectively. To address these challenges, we propose Langevin-Assisted Active Physical Discovery (LAPD), a Bayesian framework that integrates replica-exchange stochastic gradient Langevin Monte Carlo to simultaneously enable efficient system identification and robust uncertainty quantification (UQ). By balancing gradient-driven exploration in coefficient space and generating an ensemble of candidate models during exploitation, LAPD achieves reliable, uncertainty-aware identification with noisy data. In the face of data scarcity, the probabilistic foundation of LAPD further promotes the integration of active learning (AL) via a hybrid uncertainty-space-filling acquisition function. This strategy sequentially selects informative data to reduce data collection costs while maintaining accuracy. We evaluate LAPD on diverse nonlinear systems such as the Lotka-Volterra, Lorenz, Burgers, and Convection-Diffusion equations, demonstrating its robustness with noisy and limited data as well as superior uncertainty calibration compared to existing methods. The AL extension reduces the required measurements by around 60% for the Lotka-Volterra system and by around 40% for Burgers' equation compared to random data sampling, highlighting its potential for resource-constrained experiments. Our framework establishes a scalable, uncertainty-aware methodology for data-efficient discovery of dynamical systems, with broad applicability to problems where high-fidelity data acquisition is prohibitively expensive.
From Metaphor to Mechanism: How LLMs Decode Traditional Chinese Medicine Symbolic Language for Modern Clinical Relevance
Tang, Jiacheng, Wu, Nankai, Gao, Fan, Dai, Chengxiao, Zhao, Mengyao, Zhao, Xinjie
--Metaphorical expressions are abundant in Traditional Chinese Medicine (TCM), conveying complex disease mechanisms and holistic health concepts through culturally rich and often abstract terminology. Bridging these metaphors to anatomically driven Western medical (WM) concepts poses significant challenges for both automated language processing and real-world clinical practice. T o address this gap, we propose a novel multi-agent and chain-of-thought (CoT) framework designed to interpret TCM metaphors accurately and map them to WM pathophysiology. Specifically, our approach combines domain-specialized agents (TCM Expert, WM Expert) with a Coordinator Agent, leveraging stepwise chain-of-thought prompts to ensure transparent reasoning and conflict resolution. We detail a methodology for building a metaphor-rich TCM dataset, discuss strategies for effectively integrating multi-agent collaboration and CoT reasoning, and articulate the theoretical underpinnings that guide metaphor interpretation across distinct medical paradigms. We present a comprehensive system design and highlight both the potential benefits and limitations of our approach, while leaving placeholders for future experimental validation. Our work aims to support clinical decision-making, cross-system educational initiatives, and integrated healthcare research, ultimately offering a robust scaffold for reconciling TCM's symbolic language with the mechanistic focus of Western medicine.