log-concavity
Beyond Log-concavity: Provable Guarantees for Sampling Multi-modal Distributions using Simulated Tempering Langevin Monte Carlo
A key task in Bayesian machine learning is sampling from distributions that are only specified up to a partition function (i.e., constant of proportionality). One prevalent example of this is sampling posteriors in parametric distributions, such as latent-variable generative models. However sampling (even very approximately) can be #P-hard. Classical results (going back to Bakry and Emery) on sampling focus on log-concave distributions, and show a natural Markov chain called Langevin diffusion mix in polynomial time. However, all log-concave distributions are uni-modal, while in practice it is very common for the distribution of interest to have multiple modes.
Beyond Log-Concavity: Theory and Algorithm for Sum-Log-Concave Optimization
This paper extends the classic theory of convex optimization to the minimization of functions that are equal to the negated logarithm of what we term as a sum-log-concave function, i.e., a sum of log-concave functions. In particular, we show that such functions are in general not convex but still satisfy generalized convexity inequalities. These inequalities unveil the key importance of a certain vector that we call the cross-gradient and that is, in general, distinct from the usual gradient. Thus, we propose the Cross Gradient Descent (XGD) algorithm moving in the opposite direction of the cross-gradient and derive a convergence analysis. As an application of our sum-log-concave framework, we introduce the so-called checkered regression method relying on a sum-log-concave function. This classifier extends (multiclass) logistic regression to non-linearly separable problems since it is capable of tessellating the feature space by using any given number of hyperplanes, creating a checkerboard-like pattern of decision regions.
Beyond Log-concavity: Provable Guarantees for Sampling Multi-modal Distributions using Simulated Tempering Langevin Monte Carlo
Lee, Holden, Risteski, Andrej, Ge, Rong
A key task in Bayesian machine learning is sampling from distributions that are only specified up to a partition function (i.e., constant of proportionality). One prevalent example of this is sampling posteriors in parametric distributions, such as latent-variable generative models. However sampling (even very approximately) can be #P-hard. Classical results (going back to Bakry and Emery) on sampling focus on log-concave distributions, and show a natural Markov chain called Langevin diffusion mix in polynomial time. However, all log-concave distributions are uni-modal, while in practice it is very common for the distribution of interest to have multiple modes.