Goto

Collaborating Authors

 descent algorithm


Supplementary Material for Mixture weights optimisation for Alpha-Divergence Variational Inference Kamélia Daudel1,2, Randal Douc3

Neural Information Processing Systems

Assume that p and k are as in (A1). Then, the two following assertions hold. A.3 The case α < 1 for the Power Descent algorithm Let α = 1, η (0,1], κbe such that (α 1)κ 0and let the initial probability measure µ1 M1(T) be such that Ψα(µ1) < . A common way to approximate intractable integrals of the form (16) is to resort to Importance Sampling methods and in that case we are also interested in ensuring that the support of the variational approximation q Q (with q = µk in our case) is included in the support of p. Seeking to solve the Variational Inference optimation problem inf Dα(µK||P) for α < 1 enables this to happen, as opposed to the case α 1 for which the α-divergenve exhibits the so-called mode-seeking property [2, 3, 4]. As a whole, well-chosen samplers and variance reduction methods appear to be a necessity even in the case α = 1 so that the obtained Monte Carlo estimator of θ 7 bµ,α(θ)do not suffer from a too large variance.








Z (kVt(x)k

Neural Information Processing Systems

Weintroduce Unbalanced SobolevDescent (USD), aparticle descent algorithm for transporting a high dimensional source distribution to a target distribution that does not necessarily have the same mass.



MixtureweightsoptimisationforAlpha-Divergence VariationalInference

Neural Information Processing Systems

The Power Descent, defined for allα = 1, is one such algorithm and we establish in our work the full proof ofits convergence towards the optimal mixture weights whenα < 1.