Uncertainty
Uncertainty quantification of neural network models of evolving processes via Langevin sampling
Safta, Cosmin, Jones, Reese E., Patel, Ravi G., Wonnacot, Raelynn, Bolintineanu, Dan S., Hamel, Craig M., Kramer, Sharlotte L. B.
We propose a scalable, approximate inference hypernetwork framework for a general model of history-dependent processes. The flexible data model is based on a neural ordinary differential equation (NODE) representing the evolution of internal states together with a trainable observation model subcomponent. The posterior distribution corresponding to the data model parameters (weights and biases) follows a stochastic differential equation with a drift term related to the score of the posterior that is learned jointly with the data model parameters. This Langevin sampling approach offers flexibility in balancing the computational budget between the evaluation cost of the data model and the approximation of the posterior density of its parameters. We demonstrate performance of the hypernetwork on chemical reaction and material physics data and compare it to mean-field variational inference.
Meta-Dependence in Conditional Independence Testing
Mazaheri, Bijan, Zhang, Jiaqi, Uhler, Caroline
Constraint-based causal discovery algorithms utilize many statistical tests for conditional independence to uncover networks of causal dependencies. These approaches to causal discovery rely on an assumed correspondence between the graphical properties of a causal structure and the conditional independence properties of observed variables, known as the causal Markov condition and faithfulness. Finite data yields an empirical distribution that is "close" to the actual distribution. Across these many possible empirical distributions, the correspondence to the graphical properties can break down for different conditional independencies, and multiple violations can occur at the same time. We study this "meta-dependence" between conditional independence properties using the following geometric intuition: each conditional independence property constrains the space of possible joint distributions to a manifold. The "meta-dependence" between conditional independences is informed by the position of these manifolds relative to the true probability distribution. We provide a simple-to-compute measure of this meta-dependence using information projections and consolidate our findings empirically using both synthetic and real-world data.
Kullback-Leibler excess risk bounds for exponential weighted aggregation in Generalized linear models
Aggregation methods have emerged as a powerful and flexible framework in statistical learning, providing unified solutions across diverse problems such as regression, classification, and density estimation. In the context of generalized linear models (GLMs), where responses follow exponential family distributions, aggregation offers an attractive alternative to classical parametric modeling. This paper investigates the problem of sparse aggregation in GLMs, aiming to approximate the true parameter vector by a sparse linear combination of predictors. We prove that an exponential weighted aggregation scheme yields a sharp oracle inequality for the Kullback-Leibler risk with leading constant equal to one, while also attaining the minimax-optimal rate of aggregation. These results are further enhanced by establishing high-probability bounds on the excess risk.
Conditional Distribution Compression via the Kernel Conditional Mean Embedding
Broadbent, Dominic, Whiteley, Nick, Allison, Robert, Lovett, Tom
Existing distribution compression methods, like Kernel Herding (KH), were originally developed for unlabelled data. However, no existing approach directly compresses the conditional distribution of labelled data. To address this gap, we first introduce the Average Maximum Conditional Mean Discrepancy (AMCMD), a natural metric for comparing conditional distributions. We then derive a consistent estimator for the AMCMD and establish its rate of convergence. Next, we make a key observation: in the context of distribution compression, the cost of constructing a compressed set targeting the AMCMD can be reduced from $\mathcal{O}(n^3)$ to $\mathcal{O}(n)$. Building on this, we extend the idea of KH to develop Average Conditional Kernel Herding (ACKH), a linear-time greedy algorithm that constructs a compressed set targeting the AMCMD. To better understand the advantages of directly compressing the conditional distribution rather than doing so via the joint distribution, we introduce Joint Kernel Herding (JKH), a straightforward adaptation of KH designed to compress the joint distribution of labelled data. While herding methods provide a simple and interpretable selection process, they rely on a greedy heuristic. To explore alternative optimisation strategies, we propose Joint Kernel Inducing Points (JKIP) and Average Conditional Kernel Inducing Points (ACKIP), which jointly optimise the compressed set while maintaining linear complexity. Experiments show that directly preserving conditional distributions with ACKIP outperforms both joint distribution compression (via JKH and JKIP) and the greedy selection used in ACKH. Moreover, we see that JKIP consistently outperforms JKH.
Conditional Independence Test Based on Transport Maps
He, Chenxuan, Gao, Yuan, Zhu, Liping, Huang, Jian
Testing conditional independence between two random vectors given a third is a fundamental and challenging problem in statistics, particularly in multivariate nonparametric settings due to the complexity of conditional structures. We propose a novel framework for testing conditional independence using transport maps. At the population level, we show that two well-defined transport maps can transform the conditional independence test into an unconditional independence test, this substantially simplifies the problem. These transport maps are estimated from data using conditional continuous normalizing flow models. Within this framework, we derive a test statistic and prove its consistency under both the null and alternative hypotheses. A permutation-based procedure is employed to evaluate the significance of the test. We validate the proposed method through extensive simulations and real-data analysis. Our numerical studies demonstrate the practical effectiveness of the proposed method for conditional independence testing.
Neural Posterior Estimation on Exponential Random Graph Models: Evaluating Bias and Implementation Challenges
Exponential random graph models (ERGMs) are flexible probabilistic frameworks to model statistical networks through a variety of network summary statistics. Conventional Bayesian estimation for ERGMs involves iteratively exchanging with an auxiliary variable due to the intractability of ERGMs, however, this approach lacks scalability to large-scale implementations. Neural posterior estimation (NPE) is a recent advancement in simulation-based inference, using a neural network based density estimator to infer the posterior for models with doubly intractable likelihoods for which simulations can be generated. While NPE has been successfully adopted in various fields such as cosmology, little research has investigated its use for ERGMs. Performing NPE on ERGM not only provides a differing angle of resolving estimation for the intractable ERGM likelihoods but also allows more efficient and scalable inference using the amortisation properties of NPE, and therefore, we investigate how NPE can be effectively implemented in ERGMs. In this study, we present the first systematic implementation of NPE for ERGMs, rigorously evaluating potential biases, interpreting the biases magnitudes, and comparing NPE fittings against conventional Bayesian ERGM fittings. More importantly, our work highlights ERGM-specific areas that may impose particular challenges for the adoption of NPE.
Improving the evaluation of samplers on multi-modal targets
Grenioux, Louis, Noble, Maxence, Gabriรฉ, Marylou
Addressing multi-modality constitutes one of the major challenges of sampling. In this reflection paper, we advocate for a more systematic evaluation of samplers towards two sources of difficulty that are mode separation and dimension. For this, we propose a synthetic experimental setting that we illustrate on a selection of samplers, focusing on the challenging criterion of recovery of the mode relative importance. These evaluations are crucial to diagnose the potential of samplers to handle multi-modality and therefore to drive progress in the field.
From Observation to Orientation: an Adaptive Integer Programming Approach to Intervention Design
Elrefaey, Abdelmonem, Pan, Rong
Using both observational and experimental data, a causal discovery process can identify the causal relationships between variables. A unique adaptive intervention design paradigm is presented in this work, where causal directed acyclic graphs (DAGs) are for effectively recovered with practical budgetary considerations. In order to choose treatments that optimize information gain under these considerations, an iterative integer programming (IP) approach is proposed, which drastically reduces the number of experiments required. Simulations over a broad range of graph sizes and edge densities are used to assess the effectiveness of the suggested approach. Results show that the proposed adaptive IP approach achieves full causal graph recovery with fewer intervention iterations and variable manipulations than random intervention baselines, and it is also flexible enough to accommodate a variety of practical constraints.
Unifying and extending Diffusion Models through PDEs for solving Inverse Problems
Dasgupta, Agnimitra, da Cunha, Alexsander Marciano, Fardisi, Ali, Aminy, Mehrnegar, Binder, Brianna, Shaddy, Bryan, Oberai, Assad A
Diffusion models have emerged as powerful generative tools with applications in computer vision and scientific machine learning (SciML), where they have been used to solve large-scale probabilistic inverse problems. Traditionally, these models have been derived using principles of variational inference, denoising, statistical signal processing, and stochastic differential equations. In contrast to the conventional presentation, in this study we derive diffusion models using ideas from linear partial differential equations and demonstrate that this approach has several benefits that include a constructive derivation of the forward and reverse processes, a unified derivation of multiple formulations and sampling strategies, and the discovery of a new class of models. We also apply the conditional version of these models to solving canonical conditional density estimation problems and challenging inverse problems. These problems help establish benchmarks for systematically quantifying the performance of different formulations and sampling strategies in this study, and for future studies. Finally, we identify and implement a mechanism through which a single diffusion model can be applied to measurements obtained from multiple measurement operators. Taken together, the contents of this manuscript provide a new understanding and several new directions in the application of diffusion models to solving physics-based inverse problems.
DDPM Score Matching and Distribution Learning
Chewi, Sinho, Kalavasis, Alkis, Mehrotra, Anay, Montasser, Omar
Score estimation is the backbone of score-based generative models (SGMs), especially denoising diffusion probabilistic models (DDPMs). A key result in this area shows that with accurate score estimates, SGMs can efficiently generate samples from any realistic data distribution (Chen et al., ICLR'23; Lee et al., ALT'23). This distribution learning result, where the learned distribution is implicitly that of the sampler's output, does not explain how score estimation relates to classical tasks of parameter and density estimation. This paper introduces a framework that reduces score estimation to these two tasks, with various implications for statistical and computational learning theory: Parameter Estimation: Koehler et al. (ICLR'23) demonstrate that a score-matching variant is statistically inefficient for the parametric estimation of multimodal densities common in practice. In contrast, we show that under mild conditions, denoising score-matching in DDPMs is asymptotically efficient. Density Estimation: By linking generation to score estimation, we lift existing score estimation guarantees to $(\epsilon,\delta)$-PAC density estimation, i.e., a function approximating the target log-density within $\epsilon$ on all but a $\delta$-fraction of the space. We provide (i) minimax rates for density estimation over H\"older classes and (ii) a quasi-polynomial PAC density estimation algorithm for the classical Gaussian location mixture model, building on and addressing an open problem from Gatmiry et al. (arXiv'24). Lower Bounds for Score Estimation: Our framework offers the first principled method to prove computational lower bounds for score estimation across general distributions. As an application, we establish cryptographic lower bounds for score estimation in general Gaussian mixture models, conceptually recovering Song's (NeurIPS'24) result and advancing his key open problem.