Bayesian Inference
A Generalized Framework for Multiscale State-Space Modeling with Nested Nonlinear Dynamics: An Application to Bayesian Learning under Switching Regimes
Vélez-Cruz, Nayely, Laubichler, Manfred D.
In complex systems, processes operate across multiple time scales, such as rapid fluctuations in environmental conditions, intermediate responses like population dynamics, and slower shifts such as ecosystem succession or climate change. These dynamics are often nested, with fast processes embedded within slower ones. Fine-scale, rapid changes can accumulate over time to influence large-scale trends, while slower processes provide the conditions for fast dynamics to unfold. This interplay between processes at different time scales can lead to transient behaviors, where a system remains in one dynamic state for an extended period before abruptly shifting to another [4]. In ecological systems, these dynamics often manifest as long transients--periods of apparent stability followed by sudden regime shifts. These shifts can occur without any obvious external trigger, driven instead by internal processes or responses to environmental variability [6]. During these phases, a system may exhibit consistent behavior over time before transitioning to a different dynamic regime, which could involve altered oscillatory patterns or a completely new structure. Such transitions are difficult to predict, as they are nonlinear, involve systems operating at multiple interacting scales, and are influenced by stochasticity [5, 2]. Understanding these multiscale and nonlinear interactions is essential for anticipating regime shifts, which are often most consequential at the coarsest time scales, where changes in slow-moving processes like ecosystem succession or long-term climate changes lead to impactful, irreversible transitions [6].
Bayesian Counterfactual Prediction Models for HIV Care Retention with Incomplete Outcome and Covariate Information
Oganisian, Arman, Hogan, Joseph, Sang, Edwin, DeLong, Allison, Mosong, Ben, Fraser, Hamish, Mwangi, Ann
Like many chronic diseases, human immunodeficiency virus (HIV) is managed over time at regular clinic visits. At each visit, patient features are assessed, treatments are prescribed, and a subsequent visit is scheduled. There is a need for data-driven methods for both predicting retention and recommending scheduling decisions that optimize retention. Prediction models can be useful for estimating retention rates across a range of scheduling options. However, training such models with electronic health records (EHR) involves several complexities. First, formal causal inference methods are needed to adjust for observed confounding when estimating retention rates under counterfactual scheduling decisions. Second, competing events such as death preclude retention, while censoring events render retention missing. Third, inconsistent monitoring of features such as viral load and CD4 count lead to covariate missingness. This paper presents an all-in-one approach for both predicting HIV retention and optimizing scheduling while accounting for these complexities. We formulate and identify causal retention estimands in terms of potential return-time under a hypothetical scheduling decision. Flexible Bayesian approaches are used to model the observed return-time distribution while accounting for competing and censoring events and form posterior point and uncertainty estimates for these estimands. We address the urgent need for data-driven decision support in HIV care by applying our method to EHR from the Academic Model Providing Access to Healthcare (AMPATH) - a consortium of clinics that treat HIV in Western Kenya.
A Bayesian Approach to Harnessing the Power of LLMs in Authorship Attribution
Hu, Zhengmian, Zheng, Tong, Huang, Heng
Authorship attribution aims to identify the origin or author of a document. Traditional approaches have heavily relied on manual features and fail to capture long-range correlations, limiting their effectiveness. Recent advancements leverage text embeddings from pre-trained language models, which require significant fine-tuning on labeled data, posing challenges in data dependency and limited interpretability. Large Language Models (LLMs), with their deep reasoning capabilities and ability to maintain long-range textual associations, offer a promising alternative. This study explores the potential of pre-trained LLMs in one-shot authorship attribution, specifically utilizing Bayesian approaches and probability outputs of LLMs. Our methodology calculates the probability that a text entails previous writings of an author, reflecting a more nuanced understanding of authorship. By utilizing only pre-trained models such as Llama-3-70B, our results on the IMDb and blog datasets show an impressive 85\% accuracy in one-shot authorship classification across ten authors. Our findings set new baselines for one-shot authorship analysis using LLMs and expand the application scope of these models in forensic linguistics. This work also includes extensive ablation studies to validate our approach.
Flow Matching for Posterior Inference with Simulator Feedback
Holzschuh, Benjamin, Thuerey, Nils
Flow-based generative modeling is a powerful tool for solving inverse problems in physical sciences that can be used for sampling and likelihood evaluation with much lower inference times than traditional methods. We propose to refine flows with additional control signals based on a simulator. Control signals can include gradients and a problem-specific cost function if the simulator is differentiable, or they can be fully learned from the simulator output. In our proposed method, we pretrain the flow network and include feedback from the simulator exclusively for finetuning, therefore requiring only a small amount of additional parameters and compute. We motivate our design choices on several benchmark problems for simulation-based inference and evaluate flow matching with simulator feedback against classical MCMC methods for modeling strong gravitational lens systems, a challenging inverse problem in astronomy. We demonstrate that including feedback from the simulator improves the accuracy by $53\%$, making it competitive with traditional techniques while being up to $67$x faster for inference.
Model-free Estimation of Latent Structure via Multiscale Nonparametric Maximum Likelihood
Multivariate distributions often carry latent structures that are difficult to identify and estimate, and which better reflect the data generating mechanism than extrinsic structures exhibited simply by the raw data. In this paper, we propose a model-free approach for estimating such latent structures whenever they are present, without assuming they exist a priori. Given an arbitrary density $p_0$, we construct a multiscale representation of the density and propose data-driven methods for selecting representative models that capture meaningful discrete structure. Our approach uses a nonparametric maximum likelihood estimator to estimate the latent structure at different scales and we further characterize their asymptotic limits. By carrying out such a multiscale analysis, we obtain coarseto-fine structures inherent in the original distribution, which are integrated via a model selection procedure to yield an interpretable discrete representation of it. As an application, we design a clustering algorithm based on the proposed procedure and demonstrate its effectiveness in capturing a wide range of latent structures.
Individualised recovery trajectories of patients with impeded mobility, using distance between probability distributions of learnt graphs
Zhang, Chuqiao, Grosan, Crina, Chakrabarty, Dalia
Patients who are undergoing physical rehabilitation, benefit from feedback that follows from reliable assessment of their cumulative performance attained at a given time. In this paper, we provide a method for the learning of the recovery trajectory of an individual patient, as they undertake exercises as part of their physical therapy towards recovery of their loss of movement ability, following a critical illness. The difference between the Movement Recovery Scores (MRSs) attained by a patient, when undertaking a given exercise routine on successive instances, is given by a statistical distance/divergence between the (posterior) probabilities of random graphs that are Bayesianly learnt using time series data on locations of 20 of the patient's joints, recorded on an e-platform as the patient exercises. This allows for the computation of the MRS on every occasion the patient undertakes this exercise, using which, the recovery trajectory is drawn. We learn each graph as a Random Geometric Graph drawn in a probabilistic metric space, and identify the closed-form marginal posterior of any edge of the graph, given the correlation structure of the multivariate time series data on joint locations. On the basis of our recovery learning, we offer recommendations on the optimal exercise routines for patients with given level of mobility impairment.
Stream-level flow matching from a Bayesian decision theoretic perspective
Flow matching (FM) is a family of training algorithms for fitting continuous normalizing flows (CNFs). A standard approach to FM, called conditional flow matching (CFM), exploits the fact that the marginal vector field of a CNF can be learned by fitting least-square regression to the so-called conditional vector field specified given one or both ends of the flow path. We show that viewing CFM training from a Bayesian decision theoretic perspective on parameter estimation opens the door to generalizations of CFM algorithms. We propose one such extension by introducing a CFM algorithm based on defining conditional probability paths given what we refer to as ``streams'', instances of latent stochastic paths that connect pairs of noise and observed data. Further, we advocate the modeling of these latent streams using Gaussian processes (GPs). The unique distributional properties of GPs, and in particular the fact that the velocity of a GP is still a GP, allows drawing samples from the resulting stream-augmented conditional probability path without simulating the actual streams, and hence the ``simulation-free" nature of CFM training is preserved. We show that this generalization of the CFM can substantially reduce the variance in the estimated marginal vector field at a moderate computational cost, thereby improving the quality of the generated samples under common metrics. Additionally, we show that adopting the GP on the streams allows for flexibly linking multiple related training data points (e.g., time series) and incorporating additional prior information. We empirically validate our claim through both simulations and applications to two hand-written image datasets.
Active Causal Structure Learning with Latent Variables: Towards Learning to Detour in Autonomous Robots
Riscos, Pablo de los, Corbacho, Fernando
Artificial General Intelligence (AGI) Agents and Robots must be able to cope with everchanging environments and tasks. They must be able to actively construct new internal causal models of their interactions with the environment when new structural changes take place in the environment. Thus, we claim that active causal structure learning with latent variables (ACSLWL) is a necessary component to build AGI agents and robots. This paper describes how a complex planning and expectation-based detour behavior can be learned by ACSLWL when, unexpectedly, and for the first time, the simulated robot encounters a sort of "transparent" barrier in its pathway towards its target. ACSWL consists of acting in the environment, discovering new causal relations, constructing new causal models, exploiting the causal models to maximize its expected utility, detecting possible latent variables when unexpected observations occur, and constructing new structures - internal causal models and optimal estimation of the associated parameters, to be able to cope efficiently with the new encountered situations. That is, the agent must be able to construct new causal internal models that transform a previously unexpected and inefficient (sub-optimal) situation, into a predictable situation with an optimal operating plan.
Flow Matching for Atmospheric Retrieval of Exoplanets: Where Reliability meets Adaptive Noise Levels
Gebhard, Timothy D., Wildberger, Jonas, Dax, Maximilian, Kofler, Annalena, Angerhausen, Daniel, Quanz, Sascha P., Schölkopf, Bernhard
Inferring atmospheric properties of exoplanets from observed spectra is key to understanding their formation, evolution, and habitability. Since traditional Bayesian approaches to atmospheric retrieval (e.g., nested sampling) are computationally expensive, a growing number of machine learning (ML) methods such as neural posterior estimation (NPE) have been proposed. We seek to make ML-based atmospheric retrieval (1) more reliable and accurate with verified results, and (2) more flexible with respect to the underlying neural networks and the choice of the assumed noise models. First, we adopt flow matching posterior estimation (FMPE) as a new ML approach to atmospheric retrieval. FMPE maintains many advantages of NPE, but provides greater architectural flexibility and scalability. Second, we use importance sampling (IS) to verify and correct ML results, and to compute an estimate of the Bayesian evidence. Third, we condition our ML models on the assumed noise level of a spectrum (i.e., error bars), thus making them adaptable to different noise models. Both our noise level-conditional FMPE and NPE models perform on par with nested sampling across a range of noise levels when tested on simulated data. FMPE trains about 3 times faster than NPE and yields higher IS efficiencies. IS successfully corrects inaccurate ML results, identifies model failures via low efficiencies, and provides accurate estimates of the Bayesian evidence. FMPE is a powerful alternative to NPE for fast, amortized, and parallelizable atmospheric retrieval. IS can verify results, thus helping to build confidence in ML-based approaches, while also facilitating model comparison via the evidence ratio. Noise level conditioning allows design studies for future instruments to be scaled up, for example, in terms of the range of signal-to-noise ratios.
Variational Bayes Decomposition for Inverse Estimation with Superimposed Multispectral Intensity
Asahara, Akinori, Osakabe, Yoshihiro, Mitsuya, Yamamoto, Morita, Hidekazu
A variational Bayesian inference for measured wave intensity, such as X-ray intensity, is proposed in this paper. The data is popular to obtain information about unobservable features of an object, such as a material sample and the components of it. The proposed method assumes particles represent the wave, and their behaviors are stochastically modeled. The inference is accurate even if the data is noisy because of a smooth prior setting. Moreover, in this paper, two experimental results show feasibility of the proposed method.