Bayesian Inference
Normalizing Flow to Augmented Posterior: Conditional Density Estimation with Interpretable Dimension Reduction for High Dimensional Data
Zeng, Cheng, Michailidis, George, Iyatomi, Hitoshi, Duan, Leo L
The conditional density characterizes the distribution of a response variable $y$ given other predictor $x$, and plays a key role in many statistical tasks, including classification and outlier detection. Although there has been abundant work on the problem of Conditional Density Estimation (CDE) for a low-dimensional response in the presence of a high-dimensional predictor, little work has been done for a high-dimensional response such as images. The promising performance of normalizing flow (NF) neural networks in unconditional density estimation acts a motivating starting point. In this work, we extend NF neural networks when external $x$ is present. Specifically, they use the NF to parameterize a one-to-one transform between a high-dimensional $y$ and a latent $z$ that comprises two components \([z_P,z_N]\). The $z_P$ component is a low-dimensional subvector obtained from the posterior distribution of an elementary predictive model for $x$, such as logistic/linear regression. The $z_N$ component is a high-dimensional independent Gaussian vector, which explains the variations in $y$ not or less related to $x$. Unlike existing CDE methods, the proposed approach, coined Augmented Posterior CDE (AP-CDE), only requires a simple modification on the common normalizing flow framework, while significantly improving the interpretation of the latent component, since $z_P$ represents a supervised dimension reduction. In image analytics applications, AP-CDE shows good separation of $x$-related variations due to factors such as lighting condition and subject id, from the other random variations. Further, the experiments show that an unconditional NF neural network, based on an unsupervised model of $z$, such as Gaussian mixture, fails to generate interpretable results.
Determination of Particle-Size Distributions from Light-Scattering Measurement Using Constrained Gaussian Process Regression
Seyedheydari, Fahime, Nasiri, Mahdi, Mińkowski, Marcin, Särkkä, Simo
In this work, we propose a novel methodology for robustly estimating particle size distributions from optical scattering measurements using constrained Gaussian process regression. The estimation of particle size distributions is commonly formulated as a Fredholm integral equation of the first kind, an ill-posed inverse problem characterized by instability due to measurement noise and limited data. To address this, we use a Gaussian process prior to regularize the solution and integrate a normalization constraint into the Gaussian process via two approaches: by constraining the Gaussian process using a pseudo-measurement and by using Lagrange multipliers in the equivalent optimization problem. To improve computational efficiency, we employ a spectral expansion of the covariance kernel using eigenfunctions of the Laplace operator, resulting in a computationally tractable low-rank representation without sacrificing accuracy. Additionally, we investigate two complementary strategies for hyperparameter estimation: a data-driven approach based on maximizing the unconstrained log marginal likelihood, and an alternative approach where the physical constraints are taken into account. Numerical experiments demonstrate that the proposed constrained Gaussian process regression framework accurately reconstructs particle size distributions, producing numerically stable, smooth, and physically interpretable results. This methodology provides a principled and efficient solution for addressing inverse scattering problems and related ill-posed integral equations.
Generative Regression with IQ-BART
O'Hagan, Sean, Ročková, Veronika
Implicit Quantile BART (IQ-BART) posits a non-parametric Bayesian model on the conditional quantile function, acting as a model over a conditional model for $Y$ given $X$. One of the key ingredients is augmenting the observed data $\{(Y_i,X_i)\}_{i=1}^n$ with uniformly sampled values $τ_i$ for $1\leq i\leq n$ which serve as training data for quantile function estimation. Using the fact that the location parameter $μ$ in a $τ$-tilted asymmetric Laplace distribution corresponds to the $τ^{th}$ quantile, we build a check-loss likelihood targeting $μ$ as the parameter of interest. We equip the check-loss likelihood parametrized by $μ=f(X,τ)$ with a BART prior on $f(\cdot)$, allowing the conditional quantile function to vary both in $X$ and $τ$. The posterior distribution over $μ(τ,X)$ can be then distilled for estimation of the {\em entire quantile function} as well as for assessing uncertainty through the variation of posterior draws. Simulation-based predictive inference is immediately available through inverse transform sampling using the learned quantile function. The sum-of-trees structure over the conditional quantile function enables flexible distribution-free regression with theoretical guarantees. As a byproduct, we investigate posterior mean quantile estimator as an alternative to the routine sample (posterior mode) quantile estimator. We demonstrate the power of IQ-BART on time series forecasting datasets where IQ-BART can capture multimodality in predictive distributions that might be otherwise missed using traditional parametric approaches.
Model selection for stochastic dynamics: a parsimonious and principled approach
This thesis focuses on the discovery of stochastic differential equations (SDEs) and stochastic partial differential equations (SPDEs) from noisy and discrete time series. A major challenge is selecting the simplest possible correct model from vast libraries of candidate models, where standard information criteria (AIC, BIC) are often limited. We introduce PASTIS (Parsimonious Stochastic Inference), a new information criterion derived from extreme value theory. Its penalty term, $n_\mathcal{B} \ln(n_0/p)$, explicitly incorporates the size of the initial library of candidate parameters ($n_0$), the number of parameters in the considered model ($n_\mathcal{B}$), and a significance threshold ($p$). This significance threshold represents the probability of selecting a model containing more parameters than necessary when comparing many models. Benchmarks on various systems (Lorenz, Ornstein-Uhlenbeck, Lotka-Volterra for SDEs; Gray-Scott for SPDEs) demonstrate that PASTIS outperforms AIC, BIC, cross-validation (CV), and SINDy (a competing method) in terms of exact model identification and predictive capability. Furthermore, real-world data can be subject to large sampling intervals ($Δt$) or measurement noise ($σ$), which can impair model learning and selection capabilities. To address this, we have developed robust variants of PASTIS, PASTIS-$Δt$ and PASTIS-$σ$, thus extending the applicability of the approach to imperfect experimental data. PASTIS thus provides a statistically grounded, validated, and practical methodological framework for discovering simple models for processes with stochastic dynamics.
Transfer Learning in Infinite Width Feature Learning Networks
Lauditi, Clarissa, Bordelon, Blake, Pehlevan, Cengiz
We develop a theory of transfer learning in infinitely wide neural networks where both the pretraining (source) and downstream (target) task can operate in a feature learning regime. We analyze both the Bayesian framework, where learning is described by a posterior distribution over the weights, and gradient flow training of randomly initialized networks trained with weight decay. Both settings track how representations evolve in both source and target tasks. The summary statistics of these theories are adapted feature kernels which, after transfer learning, depend on data and labels from both source and target tasks. Reuse of features during transfer learning is controlled by an elastic weight coupling which controls the reliance of the network on features learned during training on the source task. We apply our theory to linear and polynomial regression tasks as well as real datasets. Our theory and experiments reveal interesting interplays between elastic weight coupling, feature learning strength, dataset size, and source and target task alignment on the utility of transfer learning.
AL-SPCE -- Reliability analysis for nondeterministic models using stochastic polynomial chaos expansions and active learning
Pires, A., Moustapha, M., Marelli, S., Sudret, B.
Reliability analysis typically relies on deterministic simulators, which yield repeatable outputs for identical inputs. However, many real-world systems display intrinsic randomness, requiring stochastic simulators whose outputs are random variables. This inherent variability must be accounted for in reliability analysis. While Monte Carlo methods can handle this, their high computational cost is often prohibitive. To address this, stochastic emulators have emerged as efficient surrogate models capable of capturing the random response of simulators at reduced cost. Although promising, current methods still require large training sets to produce accurate reliability estimates, which limits their practicality for expensive simulations. This work introduces an active learning framework to further reduce the computational burden of reliability analysis using stochastic emulators. We focus on stochastic polynomial chaos expansions (SPCE) and propose a novel learning function that targets regions of high predictive uncertainty relevant to failure probability estimation. To quantify this uncertainty, we exploit the asymptotic normality of the maximum likelihood estimator. The resulting method, named active learning stochastic polynomial chaos expansions (AL-SPCE), is applied to three test cases. Results demonstrate that AL-SPCE maintains high accuracy in reliability estimates while significantly improving efficiency compared to conventional surrogate-based methods and direct Monte Carlo simulation. This confirms the potential of active learning in enhancing the practicality of stochastic reliability analysis for complex, computationally expensive models.
Treatment, evidence, imitation, and chat
Large language models are thought to have potential to aid in medical decision making. We investigate this here. We start with the treatment problem, the patient's core medical decision-making task, which is solved in collaboration with a healthcare provider. We discuss approaches to solving the treatment problem, including -- within evidence-based medicine -- trials and observational data. We then discuss the chat problem, and how this differs from the treatment problem -- in particular as it relates to imitation. We then discuss how a large language model might be used to solve the treatment problem and highlight some of the challenges that emerge. We finally discuss how these challenges relate to evidence-based medicine, and how this might inform next steps.
Adapting Probabilistic Risk Assessment for AI
Wisakanto, Anna Katariina, Rogero, Joe, Casheekar, Avyay M., Mallah, Richard
Modern general-purpose artificial intelligence (AI) systems present an urgent risk management challenge, as their rapidly evolving capabilities and potential for catastrophic harm outpace our ability to reliably assess their risks. Current methods often rely on selective testing and undocumented assumptions about risk priorities, frequently failing to make a serious attempt at assessing the set of pathways through which AI systems pose direct or indirect risks to society and the biosphere. This paper introduces the probabilistic risk assessment (PRA) for AI framework, adapting established PRA techniques from high-reliability industries (e.g., nuclear power, aerospace) for the new challenges of advanced AI. The framework guides assessors in identifying potential risks, estimating likelihood and severity bands, and explicitly documenting evidence, underlying assumptions, and analyses at appropriate granularities. The framework's implementation tool synthesizes the results into a risk report card with aggregated risk estimates from all assessed risks. It introduces three methodological advances: (1) Aspect-oriented hazard analysis provides systematic hazard coverage guided by a first-principles taxonomy of AI system aspects (e.g. capabilities, domain knowledge, affordances); (2) Risk pathway modeling analyzes causal chains from system aspects to societal impacts using bidirectional analysis and incorporating prospective techniques; and (3) Uncertainty management employs scenario decomposition, reference scales, and explicit tracing protocols to structure credible projections with novelty or limited data. Additionally, the framework harmonizes diverse assessment methods by integrating evidence into comparable, quantified absolute risk estimates for lifecycle decisions. We have implemented this as a workbook tool for AI developers, evaluators, and regulators.
Consistency of Learned Sparse Grid Quadrature Rules using NeuralODEs
Gottschalk, Hanno, Partow, Emil, Riedlinger, Tobias J.
This paper provides a proof of the consistency of sparse grid quadrature for numerical integration of high dimensional distributions. In a first step, a transport map is learned that normalizes the distribution to a noise distribution on the unit cube. This step is built on the statistical learning theory of neural ordinary differential equations, which has been established recently. Secondly, the composition of the generative map with the quantity of interest is integrated numerically using the Clenshaw-Curtis sparse grid quadrature. A decomposition of the total numerical error in quadrature error and statistical error is provided. As main result it is proven in the framework of empirical risk minimization that all error terms can be controlled in the sense of PAC (probably approximately correct) learning and with high probability the numerical integral approximates the theoretical value up to an arbitrary small error in the limit where the data set size is growing and the network capacity is increased adaptively.
Inherited or produced? Inferring protein production kinetics when protein counts are shaped by a cell's division history
Pessoa, Pedro, Martinez, Juan Andres, Vandenbroucke, Vincent, Delvigne, Frank, Pressé, Steve
Inferring protein production kinetics for dividing cells is complicated due to protein inheritance from the mother cell. For instance, fluorescence measurements -- commonly used to assess gene activation -- may reflect not only newly produced proteins but also those inherited through successive cell divisions. In such cases, observed protein levels in any given cell are shaped by its division history. As a case study, we examine activation of the glc3 gene in yeast involved in glycogen synthesis and expressed under nutrient-limiting conditions. We monitor this activity using snapshot fluorescence measurements via flow cytometry, where GFP expression reflects glc3 promoter activity. A naïve analysis of flow cytometry data ignoring cell division suggests many cells are active with low expression. Explicitly accounting for the (non-Markovian) effects of cell division and protein inheritance makes it impossible to write down a tractable likelihood -- a key ingredient in physics-inspired inference, defining the probability of observing data given a model. The dependence on a cell's division history breaks the assumptions of standard (Markovian) master equations, rendering traditional likelihood-based approaches inapplicable. Instead, we adapt conditional normalizing flows (a class of neural network models designed to learn probability distributions) to approximate otherwise intractable likelihoods from simulated data. In doing so, we find that glc3 is mostly inactive under stress, showing that while cells occasionally activate the gene, expression is brief and transient.