A Derivation of the Evidence Lower Bound and SLAC Objectives

Neural Information Processing Systems 

We use the posterior from Equation (11), the likelihood from Equation (12), and Jensen's inequality These objectives lead to the model, policy, and critic losses. In this section, we describe the architecture of our sequential latent variable model. The parameters of the convolution layers are shared among both distributions. The latent variables have 32 and 256 dimensions, respectively, i.e. Before the agent starts learning on the task, the model is first pretrained using a small amount of random data.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found