Goto

Collaborating Authors

 mcmc inference


Measuring the reliability of MCMC inference with bidirectional Monte Carlo

Neural Information Processing Systems

Markov chain Monte Carlo (MCMC) is one of the main workhorses of probabilistic inference, but it is notoriously hard to measure the quality of approximate posterior samples. This challenge is particularly salient in black box inference methods, which can hide details and obscure inference failures. In this work, we extend the recently introduced bidirectional Monte Carlo technique to evaluate MCMC-based posterior inference algorithms. By running annealed importance sampling (AIS) chains both from prior to posterior and vice versa on simulated data, we upper bound in expectation the symmetrized KL divergence between the true posterior distribution and the distribution of approximate samples. We integrate our method into two probabilistic programming languages, WebPPL and Stan, and validate it on several models and datasets.


Reviews: Measuring the reliability of MCMC inference with bidirectional Monte Carlo

Neural Information Processing Systems

This paper has some strong points and some not so strong points. The main strong point is that using BDMC to assess convergence of MCMC operators is a beautifully simple idea, and easy to implement, which in my opinion means that this work is potentially high impact. This is particularly true in the context of probabilistic programming systems, which indeed are the envisioned use case here, and I think all such systems would do well to at least implement this method. The authors cite an arxiv submission on BDMC as existing work, but (I think wisely) choose to devote a relatively large amount of space to reiterating its description. Unfortunately this does mean that the main technical contributions presented in sections 3.1 and 3.2 are somewhat rushed, and it is unfortunately also here where the writing quality slips a bit.


Learning Multimodal Latent Space with EBM Prior and MCMC Inference

Yuan, Shiyu, Lipizzi, Carlo, Han, Tian

arXiv.org Artificial Intelligence

Multimodal generative models are crucial for various applications. We propose an approach that combines an expressive energy-based model (EBM) prior with Markov Chain Monte Carlo (MCMC) inference in the latent space for multimodal generation. The EBM prior acts as an informative guide, while MCMC inference, specifically through short-run Langevin dynamics, brings the posterior distribution closer to its true form. This method not only provides an expressive prior to better capture the complexity of multimodality but also improves the learning of shared latent variables for more coherent generation across modalities. Our proposed method is supported by empirical experiments, underscoring the effectiveness of our EBM prior with MCMC inference in enhancing cross-modal and joint generative tasks in multimodal contexts.


Distributed MCMC inference for Bayesian Non-Parametric Latent Block Model

Khoufache, Reda, Belhadj, Anisse, Azzag, Hanene, Lebbah, Mustapha

arXiv.org Artificial Intelligence

Given a data matrix, where rows represent observations and columns represent variables or features, co-clustering, also known as bi-clustering aims to infer a row partition and a column partition simultaneously. The resulting partition is composed of homogeneous blocks. When a dataset exhibits a dual structure between observations and variables, co-clustering outperforms conventional clustering algorithms which only infers a row partition without considering the relationships between observations and variables. Co-clustering is a powerful data mining tool for two-dimensional data and is widely applied in various fields such as bioinformatics [1]. To tackle the co-clustering problem, the Latent Block Model (LBM) was introduced by [2].


PriorCVAE: scalable MCMC parameter inference with Bayesian deep generative modelling

Semenova, Elizaveta, Verma, Prakhar, Cairney-Leeming, Max, Solin, Arno, Bhatt, Samir, Flaxman, Seth

arXiv.org Machine Learning

Recent advances have shown that GP priors, or their finite realisations, can be encoded using deep generative models such as variational autoencoders (VAEs). These learned generators can serve as drop-in replacements for the original priors during MCMC inference. While this approach enables efficient inference, it loses information about the hyperparameters of the original models, and consequently makes inference over hyperparameters impossible and the learned priors indistinct. To overcome this limitation, we condition the VAE on stochastic process hyperparameters. This allows the joint encoding of hyperparameters with GP realizations and their subsequent estimation during inference. Further, we demonstrate that our proposed method, PriorCVAE, is agnostic to the nature of the models which it approximates, and can be used, for instance, to encode solutions of ODEs. It provides a practical tool for approximate inference and shows potential in real-life spatial and spatiotemporal applications.


Likelihood-Based Generative Radiance Field with Latent Space Energy-Based Model for 3D-Aware Disentangled Image Representation

Zhu, Yaxuan, Xie, Jianwen, Li, Ping

arXiv.org Machine Learning

We propose the NeRF-LEBM, a likelihood-based top-down 3D-aware 2D image generative model that incorporates 3D representation via Neural Radiance Fields (NeRF) and 2D imaging process via differentiable volume rendering. The model represents an image as a rendering process from 3D object to 2D image and is conditioned on some latent variables that account for object characteristics and are assumed to follow informative trainable energy-based prior models. We propose two likelihood-based learning frameworks to train the NeRF-LEBM: (i) maximum likelihood estimation with Markov chain Monte Carlo-based inference and (ii) variational inference with the reparameterization trick. We study our models in the scenarios with both known and unknown camera poses. Experiments on several benchmark datasets demonstrate that the NeRF-LEBM can infer 3D object structures from 2D images, generate 2D images with novel views and objects, learn from incomplete 2D images, and learn from 2D images with known or unknown camera poses.


Reviews: Measuring the reliability of MCMC inference with bidirectional Monte Carlo

Neural Information Processing Systems

This paper has some strong points and some not so strong points. The main strong point is that using BDMC to assess convergence of MCMC operators is a beautifully simple idea, and easy to implement, which in my opinion means that this work is potentially high impact. This is particularly true in the context of probabilistic programming systems, which indeed are the envisioned use case here, and I think all such systems would do well to at least implement this method. The authors cite an arxiv submission on BDMC as existing work, but (I think wisely) choose to devote a relatively large amount of space to reiterating its description. Unfortunately this does mean that the main technical contributions presented in sections 3.1 and 3.2 are somewhat rushed, and it is unfortunately also here where the writing quality slips a bit.


A Tale of Two Latent Flows: Learning Latent Space Normalizing Flow with Short-run Langevin Flow for Approximate Inference

Xie, Jianwen, Zhu, Yaxuan, Xu, Yifei, Li, Dingcheng, Li, Ping

arXiv.org Artificial Intelligence

We study a normalizing flow in the latent space of a top-down generator model, in which the normalizing flow model plays the role of the informative prior model of the generator. We propose to jointly learn the latent space normalizing flow prior model and the top-down generator model by a Markov chain Monte Carlo (MCMC)-based maximum likelihood algorithm, where a short-run Langevin sampling from the intractable posterior distribution is performed to infer the latent variables for each observed example, so that the parameters of the normalizing flow prior and the generator can be updated with the inferred latent variables. We show that, under the scenario of non-convergent short-run MCMC, the finite step Langevin dynamics is a flow-like approximate inference model and the learning objective actually follows the perturbation of the maximum likelihood estimation (MLE). We further point out that the learning framework seeks to (i) match the latent space normalizing flow and the aggregated posterior produced by the short-run Langevin flow, and (ii) bias the model from MLE such that the short-run Langevin flow inference is close to the true posterior. Empirical results of extensive experiments validate the effectiveness of the proposed latent space normalizing flow model in the tasks of image generation, image reconstruction, anomaly detection, supervised image inpainting and unsupervised image recovery.


Measuring the reliability of MCMC inference with bidirectional Monte Carlo

Grosse, Roger B., Ancha, Siddharth, Roy, Daniel M.

Neural Information Processing Systems

Markov chain Monte Carlo (MCMC) is one of the main workhorses of probabilistic inference, but it is notoriously hard to measure the quality of approximate posterior samples. This challenge is particularly salient in black box inference methods, which can hide details and obscure inference failures. In this work, we extend the recently introduced bidirectional Monte Carlo technique to evaluate MCMC-based posterior inference algorithms. By running annealed importance sampling (AIS) chains both from prior to posterior and vice versa on simulated data, we upper bound in expectation the symmetrized KL divergence between the true posterior distribution and the distribution of approximate samples. We integrate our method into two probabilistic programming languages, WebPPL and Stan, and validate it on several models and datasets.


Infinite Plaid Models for Infinite Bi-Clustering

Ishiguro, Katsuhiko (NTT Corporation) | Sato, Issei (The University of Tokyo) | Nakano, Masahiro (NTT Corporation) | Kimura, Akisato (NTT Corporation) | Ueda, Naonori (NTT Corporation)

AAAI Conferences

We propose a probabilistic model for non-exhaustive and overlapping (NEO) bi-clustering. Our goal is to extract a few sub-matrices from the given data matrix, where entries of a sub-matrix are characterized by a specific distribution or parameters. Existing NEO biclustering methods typically require the number of sub-matrices to be extracted, which is essentially difficult to fix a priori. In this paper, we extend the plaid model, known as one of the best NEO bi-clustering algorithms, to allow infinite bi-clustering; NEO bi-clustering without specifying the number of sub-matrices. Our model can represent infinite sub-matrices formally. We develop a MCMC inference without the finite truncation, which potentially addresses all possible numbers of sub-matrices. Experiments quantitatively and qualitatively verify the usefulness of the proposed model. The results reveal that our model can offer more precise and in-depth analysis of sub-matrices.