A Derivation of the Evidence Lower Bound and SLAC Objectives
–Neural Information Processing Systems
We use the posterior from Equation (11), the likelihood from Equation (12), and Jensen's inequality These objectives lead to the model, policy, and critic losses. In this section, we describe the architecture of our sequential latent variable model. The parameters of the convolution layers are shared among both distributions. The latent variables have 32 and 256 dimensions, respectively, i.e. Before the agent starts learning on the task, the model is first pretrained using a small amount of random data.
Neural Information Processing Systems
Dec-27-2025, 19:29:26 GMT
- Technology: