Bayesian Inference
Bayesian Networks: Architecture Working Explained
In today's rapidly advancing world of Artificial Intelligence (AI), the need for explainable AI has become more critical than ever. As AI systems are being increasingly integrated into various aspects of our daily lives, it is crucial to understand how these systems make decisions and provide explanations for their actions. Bayesian networks, a powerful and versatile graphical modeling technique, are gaining prominence as a tool for building explainable AI models. In this blog, we will demystify Bayesian networks and explore their relevance in the field of AI. We will delve into the fundamentals of Bayesian networks, their applications in AI, and how they enable explainable AI.
A tutorial on the Bayesian statistical approach to inverse problems
Waqar, Faaiq G., Patel, Swati, Simon, Cory M.
Inverse problems are ubiquitous in the sciences and engineering. Two categories of inverse problems concerning a physical system are (1) estimate parameters in a model of the system from observed input-output pairs and (2) given a model of the system, reconstruct the input to it that caused some observed output. Applied inverse problems are challenging because a solution may (i) not exist, (ii) not be unique, or (iii) be sensitive to measurement noise contaminating the data. Bayesian statistical inversion (BSI) is an approach to tackle ill-posed and/or ill-conditioned inverse problems. Advantageously, BSI provides a "solution" that (i) quantifies uncertainty by assigning a probability to each possible value of the unknown parameter/input and (ii) incorporates prior information and beliefs about the parameter/input. Herein, we provide a tutorial of BSI for inverse problems, by way of illustrative examples dealing with heat transfer from ambient air to a cold lime fruit. First, we use BSI to infer a parameter in a dynamic model of the lime temperature from measurements of the lime temperature over time. Second, we use BSI to reconstruct the initial condition of the lime from a measurement of its temperature later in time. We demonstrate the incorporation of prior information, visualize the posterior distributions of the parameter/initial condition, and show posterior samples of lime temperature trajectories from the model. Our tutorial aims to reach a wide range of scientists and engineers.
Causal models in string diagrams
The framework of causal models provides a principled approach to causal reasoning, applied today across many scientific domains. Here we present this framework in the language of string diagrams, interpreted formally using category theory. A class of string diagrams, called network diagrams, are in 1-to-1 correspondence with directed acyclic graphs. A causal model is given by such a diagram with its components interpreted as stochastic maps, functions, or general channels in a symmetric monoidal category with a 'copy-discard' structure (cd-category), turning a model into a single mathematical object that can be reasoned with intuitively and yet rigorously. Building on prior works by Fong and Jacobs, Kissinger and Zanasi, as well as Fritz and Klingler, we present diagrammatic definitions of causal models and functional causal models in a cd-category, generalising causal Bayesian networks and structural causal models, respectively. We formalise general interventions on a model, including but beyond do-interventions, and present the natural notion of an open causal model with inputs. We also give an approach to conditioning based on a normalisation box, allowing for causal inference calculations to be done fully diagrammatically. We define counterfactuals in this setup, and treat the problems of the identifiability of causal effects and counterfactuals fully diagrammatically. The benefits of such a presentation of causal models lie in foundational questions in causal reasoning and in their clarificatory role and pedagogical value. This work aims to be accessible to different communities, from causal model practitioners to researchers in applied category theory, and discusses many examples from the literature for illustration. Overall, we argue and demonstrate that causal reasoning according to the causal model framework is most naturally and intuitively done as diagrammatic reasoning.
Local convexity of the TAP free energy and AMP convergence for Z2-synchronization
Celentano, Michael, Fan, Zhou, Mei, Song
We study mean-field variational Bayesian inference using the TAP approach, for Z2-synchronization as a prototypical example of a high-dimensional Bayesian model. We show that for any signal strength $\lambda > 1$ (the weak-recovery threshold), there exists a unique local minimizer of the TAP free energy functional near the mean of the Bayes posterior law. Furthermore, the TAP free energy in a local neighborhood of this minimizer is strongly convex. Consequently, a natural-gradient/mirror-descent algorithm achieves linear convergence to this minimizer from a local initialization, which may be obtained by a constant number of iterates of Approximate Message Passing (AMP). This provides a rigorous foundation for variational inference in high dimensions via minimization of the TAP free energy. We also analyze the finite-sample convergence of AMP, showing that AMP is asymptotically stable at the TAP minimizer for any $\lambda > 1$, and is linearly convergent to this minimizer from a spectral initialization for sufficiently large $\lambda$. Such a guarantee is stronger than results obtainable by state evolution analyses, which only describe a fixed number of AMP iterations in the infinite-sample limit. Our proofs combine the Kac-Rice formula and Sudakov-Fernique Gaussian comparison inequality to analyze the complexity of critical points that satisfy strong convexity and stability conditions within their local neighborhoods.
Maximum-likelihood Estimators in Physics-Informed Neural Networks for High-dimensional Inverse Problems
Gusmรฃo, Gabriel S., Medford, Andrew J.
Physics-informed neural networks (PINNs) have proven a suitable mathematical scaffold for solving inverse ordinary (ODE) and partial differential equations (PDE). Typical inverse PINNs are formulated as soft-constrained multi-objective optimization problems with several hyperparameters. In this work, we demonstrate that inverse PINNs can be framed in terms of maximum-likelihood estimators (MLE) to allow explicit error propagation from interpolation to the physical model space through Taylor expansion, without the need of hyperparameter tuning. We explore its application to high-dimensional coupled ODEs constrained by differential algebraic equations that are common in transient chemical and biological kinetics. Furthermore, we show that singular-value decomposition (SVD) of the ODE coupling matrices (reaction stoichiometry matrix) provides reduced uncorrelated subspaces in which PINNs solutions can be represented and over which residuals can be projected. Finally, SVD bases serve as preconditioners for the inversion of covariance matrices in this hyperparameter-free robust application of MLE to ``kinetics-informed neural networks''.
Monitoring machine learning (ML)-based risk prediction algorithms in the presence of confounding medical interventions
Feng, Jean, Gossmann, Alexej, Pennello, Gene, Petrick, Nicholas, Sahiner, Berkman, Pirracchio, Romain
Performance monitoring of machine learning (ML)-based risk prediction models in healthcare is complicated by the issue of confounding medical interventions (CMI): when an algorithm predicts a patient to be at high risk for an adverse event, clinicians are more likely to administer prophylactic treatment and alter the very target that the algorithm aims to predict. A simple approach is to ignore CMI and monitor only the untreated patients, whose outcomes remain unaltered. In general, ignoring CMI may inflate Type I error because (i) untreated patients disproportionally represent those with low predicted risk and (ii) evolution in both the model and clinician trust in the model can induce complex dependencies that violate standard assumptions. Nevertheless, we show that valid inference is still possible if one monitors conditional performance and if either conditional exchangeability or time-constant selection bias hold. Specifically, we develop a new score-based cumulative sum (CUSUM) monitoring procedure with dynamic control limits. Through simulations, we demonstrate the benefits of combining model updating with monitoring and investigate how over-trust in a prediction model may delay detection of performance deterioration. Finally, we illustrate how these monitoring methods can be used to detect calibration decay of an ML-based risk calculator for postoperative nausea and vomiting during the COVID-19 pandemic.
Bayesian Weapon System Reliability Modeling with Cox-Weibull Neural Network
We propose to integrate weapon system features (such as weapon system manufacturer, deployment time and location, storage time and location, etc.) into a parameterized Cox-Weibull [1] reliability model via a neural network, like DeepSurv [2], to improve predictive maintenance. In parallel, we develop an alternative Bayesian model by parameterizing the Weibull parameters with a neural network and employing dropout methods such as Monte-Carlo (MC)-dropout for comparative purposes. Due to data collection procedures in weapon system testing we employ a novel interval-censored log-likelihood which incorporates Monte-Carlo Markov Chain (MCMC) [3] sampling of the Weibull parameters during gradient descent optimization. We compare classification metrics such as receiver operator curve (ROC) area under the curve (AUC), precision-recall (PR) AUC, and F scores to show our model generally outperforms traditional powerful models such as XGBoost and the current standard conditional Weibull probability density estimation model.
PAC-Bayesian Learning of Aggregated Binary Activated Neural Networks with Probabilities over Representations
Fortier-Dubois, Louis, Letarte, Gaรซl, Leblanc, Benjamin, Laviolette, Franรงois, Germain, Pascal
Considering a probability distribution over parameters is known as an efficient strategy to learn a neural network with non-differentiable activation functions. We study the expectation of a probabilistic neural network as a predictor by itself, focusing on the aggregation of binary activated neural networks with normal distributions over real-valued weights. Our work leverages a recent analysis derived from the PAC-Bayesian framework that derives tight generalization bounds and learning procedures for the expected output value of such an aggregation, which is given by an analytical expression. While the combinatorial nature of the latter has been circumvented by approximations in previous works, we show that the exact computation remains tractable for deep but narrow neural networks, thanks to a dynamic programming approach. This leads us to a peculiar bound minimization learning algorithm for binary activated neural networks, where the forward pass propagates probabilities over representations instead of activation values. A stochastic counterpart that scales to wide architectures is proposed.
A Robust Test for Elliptical Symmetry
Most signal processing and statistical applications heavily rely on specific data distribution models. The Gaussian distributions, although being the most common choice, are inadequate in most real world scenarios as they fail to account for data coming from heavy-tailed populations or contaminated by outliers. Such problems call for the use of Robust Statistics. The robust models and estimators are usually based on elliptical populations, making the latter ubiquitous in all methods of robust statistics. To determine whether such tools are applicable in any specific case, goodness-of-fit (GoF) tests are used to verify the ellipticity hypothesis. Ellipticity GoF tests are usually hard to analyze and often their statistical power is not particularly strong. In this work, assuming the true covariance matrix is unknown we design and rigorously analyze a robust GoF test consistent against all alternatives to ellipticity on the unit sphere. The proposed test is based on Tyler's estimator and is formulated in terms of easily computable statistics of the data. For its rigorous analysis, we develop a novel framework based on the exchangeable random variables calculus introduced by de Finetti. Our findings are supported by numerical simulations comparing them to other popular GoF tests and demonstrating the significantly higher statistical power of the suggested technique.
Bayesian inference on Brain-Computer Interface using the GLASS Model
Zhao, Bangyao, Huggins, Jane E., Kang, Jian
The brain-computer interface (BCI) enables individuals with severe physical impairments to communicate with the world. BCIs offer computational neuroscience opportunities and challenges in converting real-time brain activities to computer commands and are typically framed as a classification problem. This article focuses on the P300 BCI that uses the event-related potential (ERP) BCI design, where the primary challenge is classifying target/non-target stimuli. We develop a novel Gaussian latent group model with sparse time-varying effects (GLASS) for making Bayesian inferences on the P300 BCI. GLASS adopts a multinomial regression framework that directly addresses the dataset imbalance in BCI applications. The prior specifications facilitate i) feature selection and noise reduction using soft-thresholding, ii) smoothing of the time-varying effects using global shrinkage, and iii) clustering of latent groups to alleviate high spatial correlations of EEG data. We develop an efficient gradient-based variational inference (GBVI) algorithm for posterior computation and provide a user-friendly Python module available at https://github.com/BangyaoZhao/GLASS. The application of GLASS identifies important EEG channels (PO8, Oz, PO7, Pz, C3) that align with existing literature. GLASS further reveals a group effect from channels in the parieto-occipital region (PO8, Oz, PO7), which is validated in cross-participant analysis.