Goto

Collaborating Authors

 vi scheme




I would like to thank the reviewers for their thorough and constructive comments, and I am confident that all of the

Neural Information Processing Systems

BBBVI took about 3 hours per dataset. The NOMT took less than 5 seconds per dataset. Reviewer 3 noted that the spike-and-slab model does not satisfy the non-overlapping support assumption of Theorem 1. It would be possible to have a "symmetric" version of the theorem, but it would describe a Reviewer 2 suggested using reconstruction error as a metric for the sparse PCA application. I will include a discussion of these similarities and differences in the revision. Reviewer 1 asked if the supports of the mixture distributions must defined a priori .


Flexible mean field variational inference using mixtures of non-overlapping exponential families

arXiv.org Machine Learning

Sparse models are desirable for many applications across diverse domains as they can perform automatic variable selection, aid interpretability, and provide regularization. When fitting sparse models in a Bayesian framework, however, analytically obtaining a posterior distribution over the parameters of interest is intractable for all but the simplest cases. As a result practitioners must rely on either sampling algorithms such as Markov chain Monte Carlo or variational methods to obtain an approximate posterior. Mean field variational inference is a particularly simple and popular framework that is often amenable to analytically deriving closed-form parameter updates. When all distributions in the model are members of exponential families and are conditionally conjugate, optimization schemes can often be derived by hand. Yet, I show that using standard mean field variational inference can fail to produce sensible results for models with sparsity-inducing priors, such as the spike-and-slab. Fortunately, such pathological behavior can be remedied as I show that mixtures of exponential family distributions with non-overlapping support form an exponential family. In particular, any mixture of a diffuse exponential family and a point mass at zero to model sparsity forms an exponential family. Furthermore, specific choices of these distributions maintain conditional conjugacy. I use two applications to motivate these results: one from statistical genetics that has connections to generalized least squares with a spike-and-slab prior on the regression coefficients; and sparse probabilistic principal component analysis. The theoretical results presented here are broadly applicable beyond these two examples.