Bayesian Inference
The Constitutional Filter
Kohaut, Simon, Divo, Felix, Flade, Benedict, Dhami, Devendra Singh, Eggert, Julian, Kersting, Kristian
Predictions in environments where a mix of legal policies, physical limitations, and operational preferences impacts an agent's motion are inherently difficult. Since Neuro-Symbolic systems allow for differentiable information flow between deep learning and symbolic building blocks, they present a promising avenue for expressing such high-level constraints. While prior work has demonstrated how to establish novel planning setups, e.g., in advanced aerial mobility tasks, their application in prediction tasks has been underdeveloped. We present the Constitutional Filter (CoFi), a novel filter architecture leveraging a Neuro-Symbolic representation of an agent's rules, i.e., its constitution, to (i) improve filter accuracy, (ii) leverage expert knowledge, (iii) incorporate deep learning architectures, and (iv) account for uncertainties in the environments through probabilistic spatial relations. CoFi follows a general, recursive Bayesian estimation setting, making it compatible with a vast landscape of estimation techniques such as Particle Filters. To underpin the advantages of CoFi, we validate its performance on real-world marine data from the Automatic Identification System and official Electronic Navigational Charts.
Rate of Model Collapse in Recursive Training
Suresh, Ananda Theertha, Thangaraj, Andrew, Khandavally, Aditya Nanda Kishore
Given the ease of creating synthetic data from machine learning models, new models can be potentially trained on synthetic data generated by previous models. This recursive training process raises concerns about the long-term impact on model quality. As models are recursively trained on generated data from previous rounds, their ability to capture the nuances of the original human-generated data may degrade. This is often referred to as \emph{model collapse}. In this work, we ask how fast model collapse occurs for some well-studied distribution families under maximum likelihood (ML or near ML) estimation during recursive training. Surprisingly, even for fundamental distributions such as discrete and Gaussian distributions, the exact rate of model collapse is unknown. In this work, we theoretically characterize the rate of collapse in these fundamental settings and complement it with experimental evaluations. Our results show that for discrete distributions, the time to forget a word is approximately linearly dependent on the number of times it occurred in the original corpus, and for Gaussian models, the standard deviation reduces to zero roughly at $n$ iterations, where $n$ is the number of samples at each iteration. Both of these findings imply that model forgetting, at least in these simple distributions under near ML estimation with many samples, takes a long time.
Bayesian penalized empirical likelihood and MCMC sampling
Chang, Jinyuan, Tang, Cheng Yong, Zhu, Yuanzheng
In this study, we introduce a novel methodological framework called Bayesian Penalized Empirical Likelihood (BPEL), designed to address the computational challenges inherent in empirical likelihood (EL) approaches. Our approach has two primary objectives: (i) to enhance the inherent flexibility of EL in accommodating diverse model conditions, and (ii) to facilitate the use of well-established Markov Chain Monte Carlo (MCMC) sampling schemes as a convenient alternative to the complex optimization typically required for statistical inference using EL. To achieve the first objective, we propose a penalized approach that regularizes the Lagrange multipliers, significantly reducing the dimensionality of the problem while accommodating a comprehensive set of model conditions. For the second objective, our study designs and thoroughly investigates two popular sampling schemes within the BPEL context. We demonstrate that the BPEL framework is highly flexible and efficient, enhancing the adaptability and practicality of EL methods. Our study highlights the practical advantages of using sampling techniques over traditional optimization methods for EL problems, showing rapid convergence to the global optima of posterior distributions and ensuring the effective resolution of complex statistical inference challenges.
A mixing time bound for Gibbs sampling from log-smooth log-concave distributions
Sampling from probability distributions in high dimensional spaces is a fundamental computational primitive; it forms the basis of efficient numerical methods for approximating arbitrary integrals. The problem statement is the following: given a density function ฯ, compute a point x with density proportional to ฯ(x). A general approach to solving this problem is to design a reversible, ergodic Markov chain with a unique stationary distribution that is equal to the target distribution from which samples are needed. It is often possible to design relatively simple chains with low per-iteration computational complexity that are fit for purpose by implementing the Metropolis-Hastings filter [1, 2], a rule by which to either accept the next step in the dynamics or remain put and so tailor the dynamics toward a specific stationary distribution. The resulting Metropolized or Markov Chain Monte Carlo algorithms are known to converge asymptotically to their stationary distributions under mild regularity conditions. Non-asymptotic rates of convergence or mixing times are comparatively few in number and are both algorithm-and target-specific. They are important because downstream estimators computed using samples drawn from a dynamics that has not converged will suffer from bias. The class of log-concave target distributions is of particular interest.
A partial likelihood approach to tree-based density modeling and its application in Bayesian inference
Tree-based models for probability distributions are usually specified using a predetermined, data-independent collection of candidate recursive partitions of the sample space. To characterize an unknown target density in detail over the entire sample space, candidate partitions must have the capacity to expand deeply into all areas of the sample space with potential non-zero sampling probability. Such an expansive system of partitions often incurs prohibitive computational costs and makes inference prone to overfitting, especially in regions with little probability mass. Existing models typically make a compromise and rely on relatively shallow trees. This hampers one of the most desirable features of trees, their ability to characterize local features, and results in reduced statistical efficiency. Traditional wisdom suggests that this compromise is inevitable to ensure coherent likelihood-based reasoning, as a data-dependent partition system that allows deeper expansion only in regions with more observations would induce double dipping of the data and thus lead to inconsistent inference. We propose a simple strategy to restore coherency while allowing the candidate partitions to be data-dependent, using Cox's partial likelihood. This strategy parametrizes the tree-based sampling model according to the allocation of probability mass based on the observed data, and yet under appropriate specification, the resulting inference remains valid. Our partial likelihood approach is broadly applicable to existing likelihood-based methods and in particular to Bayesian inference on tree-based models. We give examples in density estimation in which the partial likelihood is endowed with existing priors on tree-based models and compare with the standard, full-likelihood approach. The results show substantial gains in estimation accuracy and computational efficiency from using the partial likelihood.
Learning from Summarized Data: Gaussian Process Regression with Sample Quasi-Likelihood
Gaussian process regression is a powerful Bayesian nonlinear regression method. Recent research has enabled the capture of many types of observations using non-Gaussian likelihoods. To deal with various tasks in spatial modeling, we benefit from this development. Difficulties still arise when we can only access summarized data consisting of representative features, summary statistics, and data point counts. Such situations frequently occur primarily due to concerns about confidentiality and management costs associated with spatial data. This study tackles learning and inference using only summarized data within the framework of Gaussian process regression. To address this challenge, we analyze the approximation errors in the marginal likelihood and posterior distribution that arise from utilizing representative features. We also introduce the concept of sample quasi-likelihood, which facilitates learning and inference using only summarized data. Non-Gaussian likelihoods satisfying certain assumptions can be captured by specifying a variance function that characterizes a sample quasi-likelihood function. Theoretical and experimental results demonstrate that the approximation performance is influenced by the granularity of summarized data relative to the length scale of covariance functions. Experiments on a real-world dataset highlight the practicality of our method for spatial modeling.
An efficient search-and-score algorithm for ancestral graphs using multivariate information scores
Lagrange, Nikita, Isambert, Herve
We propose a greedy search-and-score algorithm for ancestral graphs, which include directed as well as bidirected edges, originating from unobserved latent variables. The normalized likelihood score of ancestral graphs is estimated in terms of multivariate information over relevant ``ac-connected subsets'' of vertices, C, that are connected through collider paths confined to the ancestor set of C. For computational efficiency, the proposed two-step algorithm relies on local information scores limited to the close surrounding vertices of each node (step 1) and edge (step 2). This computational strategy, although restricted to information contributions from ac-connected subsets containing up to two-collider paths, is shown to outperform state-of-the-art causal discovery methods on challenging benchmark datasets.
Leveraging Cardiovascular Simulations for In-Vivo Prediction of Cardiac Biomarkers
Manduchi, Laura, Wehenkel, Antoine, Behrmann, Jens, Pegolotti, Luca, Miller, Andy C., Sener, Ozan, Cuturi, Marco, Sapiro, Guillermo, Jacobsen, Jรถrn-Henrik
Whole-body hemodynamics simulators, which model blood flow and pressure waveforms as functions of physiological parameters, are now essential tools for studying cardiovascular systems. However, solving the corresponding inverse problem of mapping observations (e.g., arterial pressure waveforms at specific locations in the arterial network) back to plausible physiological parameters remains challenging. Leveraging recent advances in simulation-based inference, we cast this problem as statistical inference by training an amortized neural posterior estimator on a newly built large dataset of cardiac simulations that we publicly release. To better align simulated data with real-world measurements, we incorporate stochastic elements modeling exogenous effects. The proposed framework can further integrate in-vivo data sources to refine its predictive capabilities on real-world data. In silico, we demonstrate that the proposed framework enables finely quantifying uncertainty associated with individual measurements, allowing trustworthy prediction of four biomarkers of clinical interest--namely Heart Rate, Cardiac Output, Systemic Vascular Resistance, and Left Ventricular Ejection Time--from arterial pressure waveforms and photoplethysmograms. Furthermore, we validate the framework in vivo, where our method accurately captures temporal trends in CO and SVR monitoring on the VitalDB dataset. Finally, the predictive error made by the model monotonically increases with the predicted uncertainty, thereby directly supporting the automatic rejection of unusable measurements.
Reduced Order Models and Conditional Expectation
Systems may depend on parameters which one may control, or which serve to optimise the system, or are imposed externally, or they could be uncertain. This last case is taken as the "Leitmotiv" for the following. A reduced order model is produced from the full order model by some kind of projection onto a relatively low-dimensional manifold or subspace. The parameter dependent reduction process produces a function of the parameters into the manifold. One now wants to examine the relation between the full and the reduced state for all possible parameter values of interest. Similarly, in the field of machine learning, also a function of the parameter set into the image space of the machine learning model is learned on a training set of samples, typically minimising the mean-square error. This set may be seen as a sample from some probability distribution, and thus the training is an approximate computation of the expectation, giving an approximation to the conditional expectation, a special case of an Bayesian updating where the Bayesian loss function is the mean-square error. This offers the possibility of having a combined look at these methods, and also introducing more general loss functions.
Fast Multi-Group Gaussian Process Factor Models
Gokcen, Evren, Jasper, Anna I., Kohn, Adam, Machens, Christian K., Yu, Byron M.
Gaussian processes are now commonly used in dimensionality reduction approaches tailored to neuroscience, especially to describe changes in high-dimensional neural activity over time. As recording capabilities expand to include neuronal populations across multiple brain areas, cortical layers, and cell types, interest in extending Gaussian process factor models to characterize multi-population interactions has grown. However, the cubic runtime scaling of current methods with the length of experimental trials and the number of recorded populations (groups) precludes their application to large-scale multi-population recordings. Here, we improve this scaling from cubic to linear in both trial length and group number. We present two approximate approaches to fitting multi-group Gaussian process factor models based on (1) inducing variables and (2) the frequency domain. Empirically, both methods achieved orders of magnitude speed-up with minimal impact on statistical performance, in simulation and on neural recordings of hundreds of neurons across three brain areas. The frequency domain approach, in particular, consistently provided the greatest runtime benefits with the fewest trade-offs in statistical performance. We further characterize the estimation biases introduced by the frequency domain approach and demonstrate effective strategies to mitigate them. This work enables a powerful class of analysis techniques to keep pace with the growing scale of multi-population recordings, opening new avenues for exploring brain function.