Goto

Collaborating Authors

 Bayesian Learning


Multivariate and Online Transfer Learning with Uncertainty Quantification

arXiv.org Machine Learning

Untreated periodontitis causes inflammation within the supporting tissue of the teeth and can ultimately lead to tooth loss. Modeling periodontal outcomes is beneficial as they are difficult and time consuming to measure, but disparities in representation between demographic groups must be considered. There may not be enough participants to build group specific models and it can be ineffective, and even dangerous, to apply a model to participants in an underrepresented group if demographic differences were not considered during training. We propose an extension to RECaST Bayesian transfer learning framework. Our method jointly models multivariate outcomes, exhibiting significant improvement over the previous univariate RECaST method. Further, we introduce an online approach to model sequential data sets. Negative transfer is mitigated to ensure that the information shared from the other demographic groups does not negatively impact the modeling of the underrepresented participants. The Bayesian framework naturally provides uncertainty quantification on predictions. Especially important in medical applications, our method does not share data between domains. We demonstrate the effectiveness of our method in both predictive performance and uncertainty quantification on simulated data and on a database of dental records from the HealthPartners Institute.


Variational Bayesian Bow tie Neural Networks with Shrinkage

arXiv.org Machine Learning

Despite the dominant role of deep models in machine learning, limitations persist, including overconfident predictions, susceptibility to adversarial attacks, and underestimation of variability in predictions. The Bayesian paradigm provides a natural framework to overcome such issues and has become the gold standard for uncertainty estimation with deep models, also providing improved accuracy and a framework for tuning critical hyperparameters. However, exact Bayesian inference is challenging, typically involving variational algorithms that impose strong independence and distributional assumptions. Moreover, existing methods are sensitive to the architectural choice of the network. We address these issues by constructing a relaxed version of the standard feed-forward rectified neural network, and employing Polya-Gamma data augmentation tricks to render a conditionally linear and Gaussian model. Additionally, we use sparsity-promoting priors on the weights of the neural network for data-driven architectural design. To approximate the posterior, we derive a variational inference algorithm that avoids distributional assumptions and independence across layers and is a faster alternative to the usual Markov Chain Monte Carlo schemes.


Transmission Line Outage Probability Prediction Under Extreme Events Using Peter-Clark Bayesian Structural Learning

arXiv.org Artificial Intelligence

Recent years have seen a notable increase in the frequency and intensity of extreme weather events. With a rising number of power outages caused by these events, accurate prediction of power line outages is essential for safe and reliable operation of power grids. The Bayesian network is a probabilistic model that is very effective for predicting line outages under weather-related uncertainties. However, most existing studies in this area offer general risk assessments, but fall short of providing specific outage probabilities. In this work, we introduce a novel approach for predicting transmission line outage probabilities using a Bayesian network combined with Peter-Clark (PC) structural learning. Our approach not only enables precise outage probability calculations, but also demonstrates better scalability and robust performance, even with limited data. Case studies using data from BPA and NOAA show the effectiveness of this approach, while comparisons with several existing methods further highlight its advantages.


Hierarchical-Graph-Structured Edge Partition Models for Learning Evolving Community Structure

arXiv.org Artificial Intelligence

We propose a novel dynamic network model to capture evolving latent communities within temporal networks. To achieve this, we decompose each observed dynamic edge between vertices using a Poisson-gamma edge partition model, assigning each vertex to one or more latent communities through \emph{nonnegative} vertex-community memberships. Specifically, hierarchical transition kernels are employed to model the interactions between these latent communities in the observed temporal network. A hierarchical graph prior is placed on the transition structure of the latent communities, allowing us to model how they evolve and interact over time. Consequently, our dynamic network enables the inferred community structure to merge, split, and interact with one another, providing a comprehensive understanding of complex network dynamics. Experiments on various real-world network datasets demonstrate that the proposed model not only effectively uncovers interpretable latent structures but also surpasses other state-of-the art dynamic network models in the tasks of link prediction and community detection.


Just Leaf It: Accelerating Diffusion Classifiers with Hierarchical Class Pruning

arXiv.org Artificial Intelligence

Diffusion models, known for their generative capabilities, have recently shown unexpected potential in image classification tasks by using Bayes' theorem. However, most diffusion classifiers require evaluating all class labels for a single classification, leading to significant computational costs that can hinder their application in large-scale scenarios. To address this, we present a Hierarchical Diffusion Classifier (HDC) that exploits the inherent hierarchical label structure of a dataset. By progressively pruning irrelevant high-level categories and refining predictions only within relevant subcategories, i.e., leaf nodes, HDC reduces the total number of class evaluations. As a result, HDC can accelerate inference by up to 60% while maintaining and, in some cases, improving classification accuracy. Our work enables a new control mechanism of the trade-off between speed and precision, making diffusion-based classification more viable for real-world applications, particularly in large-scale image classification tasks.


On the physics of nested Markov models: a generalized probabilistic theory perspective

arXiv.org Artificial Intelligence

Determining potential probability distributions with a given causal graph is vital for causality studies. To bypass the difficulty in characterizing latent variables in a Bayesian network, the nested Markov model provides an elegant algebraic approach by listing exactly all the equality constraints on the observed variables. However, this algebraically motivated causal model comprises distributions outside Bayesian networks, and its physical interpretation remains vague. In this work, we inspect the nested Markov model through the lens of generalized probabilistic theory, an axiomatic framework to describe general physical theories. We prove that all the equality constraints defining the nested Markov model hold valid theory-independently. Yet, we show this model generally contains distributions not implementable even within such relaxed physical theories subjected to merely the relativity principles and mild probabilistic rules. To interpret the origin of such a gap, we establish a new causal model that defines valid distributions as projected from a high-dimensional Bell-type causal structure. The new model unveils inequality constraints induced by relativity principles, or equivalently high-dimensional conditional independences, which are absent in the nested Markov model. Nevertheless, we also notice that the restrictions on states and measurements introduced by the generalized probabilistic theory framework can pose additional inequality constraints beyond the new causal model. As a by-product, we discover a new causal structure exhibiting strict gaps between the distribution sets of a Bayesian network, generalized probabilistic theories, and the nested Markov model. We anticipate our results will enlighten further explorations on the unification of algebraic and physical perspectives of causality.


Fine-Grained Uncertainty Quantification via Collisions

arXiv.org Machine Learning

We propose a new approach for fine-grained uncertainty quantification (UQ) using a collision matrix. For a classification problem involving $K$ classes, the $K\times K$ collision matrix $S$ measures the inherent (aleatoric) difficulty in distinguishing between each pair of classes. In contrast to existing UQ methods, the collision matrix gives a much more detailed picture of the difficulty of classification. We discuss several possible downstream applications of the collision matrix, establish its fundamental mathematical properties, as well as show its relationship with existing UQ methods, including the Bayes error rate. We also address the new problem of estimating the collision matrix using one-hot labeled data. We propose a series of innovative techniques to estimate $S$. First, we learn a contrastive binary classifier which takes two inputs and determines if they belong to the same class. We then show that this contrastive classifier (which is PAC learnable) can be used to reliably estimate the Gramian matrix of $S$, defined as $G=S^TS$. Finally, we show that under very mild assumptions, $G$ can be used to uniquely recover $S$, a new result on stochastic matrices which could be of independent interest. Experimental results are also presented to validate our methods on several datasets.


BALI: Learning Neural Networks via Bayesian Layerwise Inference

arXiv.org Machine Learning

We introduce a new method for learning Bayesian neural networks, treating them as a stack of multivariate Bayesian linear regression models. The main idea is to infer the layerwise posterior exactly if we know the target outputs of each layer. We define these pseudo-targets as the layer outputs from the forward pass, updated by the backpropagated gradients of the objective function. The resulting layerwise posterior is a matrix-normal distribution with a Kronecker-factorized covariance matrix, which can be efficiently inverted. Our method extends to the stochastic mini-batch setting using an exponential moving average over natural-parameter terms, thus gradually forgetting older data. The method converges in few iterations and performs as well as or better than leading Bayesian neural network methods on various regression, classification, and out-of-distribution detection benchmarks.


The Statistical Accuracy of Neural Posterior and Likelihood Estimation

arXiv.org Machine Learning

These methods can approximate the likelihood through neural likelihood estimation (NLE) (Papamakarios et al., 2019) or directly target the posterior distribution with neural posterior estimation (NPE) (Greenberg et al., 2019; Lueckmann et al., 2017; Papamakarios and Murray, 2016), with NLE requiring subsequent Markov Chain Monte Carlo (MCMC) steps to produce posterior samples. The hallmark of these neural methods is their ability to accurately approximate complex posterior distributions using only forward simulations from the assumed model. While sequential methods iteratively refine the posterior estimate through multiple rounds of simulation, one-shot NPE and NLE methods perform inference in a single round, enabling amortized inference where a trained model can be reused for multiple datasets without retraining (see, e.g., Radev et al., 2020; Gloeckler et al., 2024). In particular, like the statistical methods of approximate Bayesian computation (ABC), see, e.g., Sisson et al. (2018) for a handbook treatment, and Martin et al. (2023) for a recent summary, and Bayesian synthetic likelihood (BSL), see, e.g., Wood (2010), Price et al. (2018) and Frazier et al. (2023), NPE and NLE first reduce the data down to a vector of statistics and then build an approximation to the resulting partial posterior by substituting likelihood evaluation with forward simulation from the assumed model. In contrast to the statistical methods for likelihood-free inference like ABC and BSL, NPE (respectively, NLE) approximates the posterior (resp., the likelihood) directly by fitting flexible conditional density estimators, usually neural-or flow-based approaches, using training data that is simulated from the assumed model space. The approximation that results from this training step is then directly used as a posterior in the context of NPE or as a likelihood in the case of NLE, with MCMC for this trained likelihood then used to produce draws from an approximate posterior.


Efficient Sample-optimal Learning of Gaussian Tree Models via Sample-optimal Testing of Gaussian Mutual Information

arXiv.org Machine Learning

Learning high-dimensional distributions is a significant challenge in machine learning and statistics. Classical research has mostly concentrated on asymptotic analysis of such data under suitable assumptions. While existing works [Bhattacharyya et al.: SICOMP 2023, Daskalakis et al.: STOC 2021, Choo et al.: ALT 2024] focus on discrete distributions, the current work addresses the tree structure learning problem for Gaussian distributions, providing efficient algorithms with solid theoretical guarantees. This is crucial as real-world distributions are often continuous and differ from the discrete scenarios studied in prior works. In this work, we design a conditional mutual information tester for Gaussian random variables that can test whether two Gaussian random variables are independent, or their conditional mutual information is at least $\varepsilon$, for some parameter $\varepsilon \in (0,1)$ using $\mathcal{O}(\varepsilon^{-1})$ samples which we show to be near-optimal. In contrast, an additive estimation would require $\Omega(\varepsilon^{-2})$ samples. Our upper bound technique uses linear regression on a pair of suitably transformed random variables. Importantly, we show that the chain rule of conditional mutual information continues to hold for the estimated (conditional) mutual information. As an application of such a mutual information tester, we give an efficient $\varepsilon$-approximate structure-learning algorithm for an $n$-variate Gaussian tree model that takes $\widetilde{\Theta}(n\varepsilon^{-1})$ samples which we again show to be near-optimal. In contrast, when the underlying Gaussian model is not known to be tree-structured, we show that $\widetilde{{{\Theta}}}(n^2\varepsilon^{-2})$ samples are necessary and sufficient to output an $\varepsilon$-approximate tree structure. We perform extensive experiments that corroborate our theoretical convergence bounds.