Goto

Collaborating Authors

 bvi


e3844e186e6eb8736e9f53c0c5889527-Paper.pdf

Neural Information Processing Systems

Inference networks oftraditional Variational Autoencoders (VAEs) aretypically amortized, resulting in relatively inaccurate posterior approximation compared to instance-wise variational optimization. Recent semi-amortized approaches were proposedtoaddress thisdrawback; however,theiriterativegradient update procedures can be computationally demanding.


Rn(a) with Rn(a): =nX

Neural Information Processing Systems

In particular, Bai et al. [5], Jin et al. [31] developed the first algorithms to beat the curse of multiple agents in twoplayer zero-sum MGs, while Jin et al. [31], Daskalakis et al. [23], Mao and Ba sar [44], Song et al. [63] further demonstrated how to accomplish the same goal when learning other computationally tractable solution concepts (e.g., coarse correlated equilibria) in general-sum multi-player Markov games. We shall also briefly remark on the prior works that concern RL with a generative model. A key term in the regret bound (36) is a weighted sum of the "variance-style" quantities {Varπk(`k)}. While Var(`k) k`kk2 is orderwise tight in the worst-case scenario for a given iteration k, exploiting the problem-specific variance-type structure across time is crucial in sharpening the horizon dependence in many RL problems(e.g.,Azaretal.[3],Jinetal.[30],Lietal.[41,40]). C.1 Preliminariesandnotation Let us start with some preliminary facts and notation.


Recursive Inference for Variational Autoencoders

Neural Information Processing Systems

Inference networks of traditional Variational Autoencoders (VAEs) are typically amortized, resulting in relatively inaccurate posterior approximation compared to instance-wise variational optimization. Recent semi-amortized approaches were proposed to address this drawback; however, their iterative gradient update procedures can be computationally demanding. In this paper, we consider a different approach of building a mixture inference model. We propose a novel recursive mixture estimation algorithm for VAEs that iteratively augments the current mixture with new components so as to maximally reduce the divergence between the variational and the true posteriors. Using the functional gradient approach, we devise an intuitive learning criteria for selecting a new mixture component: the new component has to improve the data likelihood (lower bound) and, at the same time, be as divergent from the current mixture distribution as possible, thus increasing representational diversity. Although there have been similar approaches recently, termed boosted variational inference (BVI), our methods differ from BVI in several aspects, most notably that ours deal with recursive inference in VAEs in the form of amortized inference, while BVI is developed within the standard VI framework, leading to a non-amortized single optimization instance, inappropriate for VAEs. A crucial benefit of our approach is that the inference at test time needs a single feed-forward pass through the mixture inference network, making it significantly faster than the semi-amortized approaches. We show that our approach yields higher test data likelihood than the state-of-the-arts on several benchmark datasets.


Recursive Inference for Variational Autoencoders

Neural Information Processing Systems

Inference networks of traditional Variational Autoencoders (VAEs) are typically amortized, resulting in relatively inaccurate posterior approximation compared to instance-wise variational optimization. Recent semi-amortized approaches were proposed to address this drawback; however, their iterative gradient update procedures can be computationally demanding. In this paper, we consider a different approach of building a mixture inference model. We propose a novel recursive mixture estimation algorithm for VAEs that iteratively augments the current mixture with new components so as to maximally reduce the divergence between the variational and the true posteriors. Using the functional gradient approach, we devise an intuitive learning criteria for selecting a new mixture component: the new component has to improve the data likelihood (lower bound) and, at the same time, be as divergent from the current mixture distribution as possible, thus increasing representational diversity.