Bayesian Learning
Bayesian Numerical Integration with Neural Networks
Ott, Katharina, Tiemann, Michael, Hennig, Philipp, Briol, François-Xavier
Bayesian probabilistic numerical methods for numerical integration offer significant advantages over their non-Bayesian counterparts: they can encode prior information about the integrand, and can quantify uncertainty over estimates of an integral. However, the most popular algorithm in this class, Bayesian quadrature, is based on Gaussian process models and is therefore associated with a high computational cost. To improve scalability, we propose an alternative approach based on Bayesian neural networks which we call Bayesian Stein networks. The key ingredients are a neural network architecture based on Stein operators, and an approximation of the Bayesian posterior based on the Laplace approximation. We show that this leads to orders of magnitude speed-ups on the popular Genz functions benchmark, and on challenging problems arising in the Bayesian analysis of dynamical systems, and the prediction of energy production for a large-scale wind farm.
Affine Invariant Ensemble Transform Methods to Improve Predictive Uncertainty in ReLU Networks
Bhandari, Diksha, Pidstrigach, Jakiw, Reich, Sebastian
We consider the problem of performing Bayesian inference for logistic regression using appropriate extensions of the ensemble Kalman filter. Two interacting particle systems are proposed that sample from an approximate posterior and prove quantitative convergence rates of these interacting particle systems to their mean-field limit as the number of particles tends to infinity. Furthermore, we apply these techniques and examine their effectiveness as methods of Bayesian approximation for quantifying predictive uncertainty in ReLU networks.
Quantum-Inspired Machine Learning: a Survey
Huynh, Larry, Hong, Jin, Mian, Ajmal, Suzuki, Hajime, Wu, Yanqiu, Camtepe, Seyit
Quantum-inspired Machine Learning (QiML) is a burgeoning field, receiving global attention from researchers for its potential to leverage principles of quantum mechanics within classical computational frameworks. However, current review literature often presents a superficial exploration of QiML, focusing instead on the broader Quantum Machine Learning (QML) field. In response to this gap, this survey provides an integrated and comprehensive examination of QiML, exploring QiML's diverse research domains including tensor network simulations, dequantized algorithms, and others, showcasing recent advancements, practical applications, and illuminating potential future research avenues. Further, a concrete definition of QiML is established by analyzing various prior interpretations of the term and their inherent ambiguities. As QiML continues to evolve, we anticipate a wealth of future developments drawing from quantum mechanics, quantum computing, and classical machine learning, enriching the field further. This survey serves as a guide for researchers and practitioners alike, providing a holistic understanding of QiML's current landscape and future directions.
Variations and Relaxations of Normalizing Flows
Kelly, Keegan, Piedras, Lorena, Rao, Sukrit, Roth, David
Normalizing Flows (NFs) describe a class of models that express a complex target distribution as the composition of a series of bijective transformations over a simpler base distribution. By limiting the space of candidate transformations to diffeomorphisms, NFs enjoy efficient, exact sampling and density evaluation, enabling NFs to flexibly behave as both discriminative and generative models. Their restriction to diffeomorphisms, however, enforces that input, output and all intermediary spaces share the same dimension, limiting their ability to effectively represent target distributions with complex topologies (Zhang and Chen 2021). Additionally, in cases where the prior and target distributions are not homeomorphic, Normalizing Flows can leak mass outside of the support of the target (Cornish et al. 2019; Wu et al. 2020). This survey covers a selection of recent works that combine aspects of other generative model classes, such as VAEs and diffusion, and in doing so loosen the strict bijectivity constraints of NFs to achieve a balance of expressivity, training speed, sample efficiency and likelihood tractability.
Prior Density Learning in Variational Bayesian Phylogenetic Parameters Inference
Remita, Amine M., Vitae, Golrokh, Diallo, Abdoulaye Baniré
The advances in variational inference are providing promising paths in Bayesian estimation problems. These advances make variational phylogenetic inference an alternative approach to Markov Chain Monte Carlo methods for approximating the phylogenetic posterior. However, one of the main drawbacks of such approaches is modelling the prior through fixed distributions, which could bias the posterior approximation if they are distant from the current data distribution. In this paper, we propose an approach and an implementation framework to relax the rigidity of the prior densities by learning their parameters using a gradient-based method and a neural network-based parameterization. We applied this approach for branch lengths and evolutionary parameters estimation under several Markov chain substitution models. The results of performed simulations show that the approach is powerful in estimating branch lengths and evolutionary model parameters. They also show that a flexible prior model could provide better results than a predefined prior model. Finally, the results highlight that using neural networks improves the initialization of the optimization of the prior density parameters.
Generalization Bounds: Perspectives from Information Theory and PAC-Bayes
Hellström, Fredrik, Durisi, Giuseppe, Guedj, Benjamin, Raginsky, Maxim
A fundamental question in theoretical machine learning is generalization. Over the past decades, the PAC-Bayesian approach has been established as a flexible framework to address the generalization capabilities of machine learning algorithms, and design new ones. Recently, it has garnered increased interest due to its potential applicability for a variety of learning algorithms, including deep neural networks. In parallel, an information-theoretic view of generalization has developed, wherein the relation between generalization and various information measures has been established. This framework is intimately connected to the PAC-Bayesian approach, and a number of results have been independently discovered in both strands. In this monograph, we highlight this strong connection and present a unified treatment of generalization. We present techniques and results that the two perspectives have in common, and discuss the approaches and interpretations that differ. In particular, we demonstrate how many proofs in the area share a modular structure, through which the underlying ideas can be intuited. We pay special attention to the conditional mutual information (CMI) framework; analytical studies of the information complexity of learning algorithms; and the application of the proposed methods to deep learning. This monograph is intended to provide a comprehensive introduction to information-theoretic generalization bounds and their connection to PAC-Bayes, serving as a foundation from which the most recent developments are accessible. It is aimed broadly towards researchers with an interest in generalization and theoretical machine learning.
Mean-field Variational Inference via Wasserstein Gradient Flow
One of the core problems of modern Bayesian inference is to compute the posterior distribution, a joint probability measure over unknown quantities, such as model parameters and unobserved latent variables, obtained by combining data information with prior knowledge in a principled manner. Modern statistics often rely on complex models for which the posterior distribution is analytically intractable and requires approximate computation. As a common alternative strategy to conventional Markov chain Monte Carlo (MCMC) sampling approach for approximating the posterior, variational inference (VI, [10]), or variational Bayes [27], finds the closest member in a user specified class of analytically tractable distributions, referred to as the variational (distribution) family, to approximate the target posterior. Although MCMC is asymptotically exact, VI is usually orders of magnitude faster [12, 62] since it turns the sampling or integration into an optimization problem. VI has successfully demonstrated its power in a wide variety of applications, including clustering [11, 23], semi-supervised learning [38], neural-network training [5, 52], and probabilistic modeling [13, 36]. Among various approximating schemes, the mean-field (MF) approximation, which originates from statistical mechanics and uses the approximating family consisting of all fully factorized density functions over (blocks of) the unknown quantities, is the most widely used and representative instance of VI that is conceptually simple yet practically powerful. On the downside, VI still requires certain conditional conjugacy structure to facilitate efficient computation (c.f.
Blink: Link Local Differential Privacy in Graph Neural Networks via Bayesian Estimation
Zhu, Xiaochen, Tan, Vincent Y. F., Xiao, Xiaokui
Graph neural networks (GNNs) have gained an increasing amount of popularity due to their superior capability in learning node embeddings for various graph inference tasks, but training them can raise privacy concerns. To address this, we propose using link local differential privacy over decentralized nodes, enabling collaboration with an untrusted server to train GNNs without revealing the existence of any link. Our approach spends the privacy budget separately on links and degrees of the graph for the server to better denoise the graph topology using Bayesian estimation, alleviating the negative impact of LDP on the accuracy of the trained GNNs. We bound the mean absolute error of the inferred link probabilities against the ground truth graph topology. We then propose two variants of our LDP mechanism complementing each other in different privacy settings, one of which estimates fewer links under lower privacy budgets to avoid false positive link estimates when the uncertainty is high, while the other utilizes more information and performs better given relatively higher privacy budgets. Furthermore, we propose a hybrid variant that combines both strategies and is able to perform better across different privacy budgets. Extensive experiments show that our approach outperforms existing methods in terms of accuracy under varying privacy budgets.
Dynamic Causal Graph Convolutional Network for Traffic Prediction
Lin, Junpeng, Li, Ziyue, Li, Zhishuai, Bai, Lei, Zhao, Rui, Zhang, Chen
Modeling complex spatiotemporal dependencies in correlated traffic series is essential for traffic prediction. While recent works have shown improved prediction performance by using neural networks to extract spatiotemporal correlations, their effectiveness depends on the quality of the graph structures used to represent the spatial topology of the traffic network. In this work, we propose a novel approach for traffic prediction that embeds time-varying dynamic Bayesian network to capture the fine spatiotemporal topology of traffic data. We then use graph convolutional networks to generate traffic forecasts. To enable our method to efficiently model nonlinear traffic propagation patterns, we develop a deep learning-based module as a hyper-network to generate stepwise dynamic causal graphs. Our experimental results on a real traffic dataset demonstrate the superior prediction performance of the proposed method. The code is available at https://github.com/MonBG/DCGCN.
Truncated Diffusion Probabilistic Models and Diffusion-based Adversarial Auto-Encoders
Zheng, Huangjie, He, Pengcheng, Chen, Weizhu, Zhou, Mingyuan
Employing a forward diffusion chain to gradually map the data to a noise distribution, diffusion-based generative models learn how to generate the data by inferring a reverse diffusion chain. However, this approach is slow and costly because it needs many forward and reverse steps. We propose a faster and cheaper approach that adds noise not until the data become pure random noise, but until they reach a hidden noisy-data distribution that we can confidently learn. Then, we use fewer reverse steps to generate data by starting from this hidden distribution that is made similar to the noisy data. We reveal that the proposed model can be cast as an adversarial auto-encoder empowered by both the diffusion process and a learnable implicit prior. Experimental results show even with a significantly smaller number of reverse diffusion steps, the proposed truncated diffusion probabilistic models can provide consistent improvements over the non-truncated ones in terms of performance in both unconditional and text-guided image generations. Generating photo-realistic images with probabilistic models is a challenging and important task in machine learning and computer vision, with many potential applications in data augmentation, image editing, style transfer, etc. This new modeling class, which includes both score-based and diffusion-based generative models, uses noise injection to gradually corrupt the data distribution into a simple noise distribution that can be easily sampled from, and then uses a denoising network to reverse the noise injection to generate photo-realistic images. From the perspective of score matching (Hyvärinen & Dayan, 2005; Vincent, 2011) and Langevin dynamics (Neal, 2011; Welling & Teh, 2011), the denoising network is trained by matching the score function, which is the gradient of the log-density of the data, of the corrupted data distribution and that of the generator distribution at different noise levels (Song & Ermon, 2019). This training objective can also be formulated under diffusion-based generative models (Sohl-Dickstein et al., 2015; Ho et al., 2020). These two types of models have been further unified by Song et al. (2021b) under the framework of discretized stochastic differential equations.