Goto

Collaborating Authors

 minibatch









Table A: FID on CIFAR10. means averaged by 5 runs. Methods with use comparable networks. Method FID FID-ES Flow-CE [ 1*] 37.30 - V AE-EBL VM[2*] 30.1 - MDSM [34] - 31.7 MDSM

Neural Information Processing Systems

We thank all reviewers for their valuable comments. Below, we first address the common concerns and then answer the detailed questions. It leads to smaller bias (see Fig. A), which also agrees with Thm. 2. First, introducing latent variables can improve the sample quality (w.r.t. Indeed, we update Tab. 2 and obtain Tab. As stated in L290, a similar protocol is adopted in MDSM [34].



Considerminimizinganempiricalloss min

Neural Information Processing Systems

Many learning tasks, such as regression and classification, are usually framed that way [1]. When N 1, computing the gradient of the objective in(1) becomes a bottleneck, even if individual gradients θL(zi,θ) are cheap to evaluate. For a fixed computational budget, itisthustempting toreplace vanilla gradient descent bymore iterations but using anapproximate gradient, obtained using only afewdata points. Stochastic gradient descent (SGD; [2]) follows this template.