Uncertainty
Energy-Based Modelling for Discrete and Mixed Data via Heat Equations on Structured Spaces
However, training EBMs on data in discrete or mixed state spaces poses significant challenges due to the lack of robust and fast sampling methods. In this work, we propose to train discrete EBMs with Energy Discrepancy, a loss function which only requires the evaluation of the energy function at data points and their perturbed counterparts, thus eliminating the need for Markov chain Monte Carlo.
Structure Learning with Side Information: Sample Complexity
Graphical models are widely used to compactly model the conditional interdependence among multiple random variables Lauritzen [1996] and Pearl [2009]. The vertices of the graph represent the random variables (RVs), while the edges encode the inter-dependence among the RVs. The complete structure of the graph is analytically captured by the joint probability distribution of the random variables. Graphical models offer effective and tractable solutions to various inferential and decision-making solutions in different domains, e.g., computer vision Won and Derin [1992], genetics Chen et al. [2013], Fang et al. [2016], Dobra et al. [2004], social networks Jacob et al. [2014], and power systems Dvijotham et al. [2017].
On the Convergence of Black-Box Variational Inference
We provide the first convergence guarantee for black-box variational inference (BBVI) with the reparameterization gradient. While preliminary investigations worked on simplified versions of BBVI ( e.g., bounded domain, bounded support, only optimizing for the scale, and such), our setup does not need any such algorithmic modifications. Our results hold for log-smooth posterior densities with and without strong log-concavity and the location-scale variational family. Notably, our analysis reveals that certain algorithm design choices commonly employed in practice, such as nonlinear parameterizations of the scale matrix, can result in suboptimal convergence rates. Fortunately, running BBVI with proximal stochastic gradient descent fixes these limitations and thus achieves the strongest known convergence guarantees. We evaluate this theoretical insight by comparing proximal SGD against other standard implementations of BBVI on large-scale Bayesian inference problems.