Appendix for: Invertible Gaussian Reparameterization 1 Computing the determinant of the Jacobian of the softmax + +

Neural Information Processing Systems 

As mentioned in section 3.1, we can use the matrix determinant lemma to efficiently compute the Proof: For k = 1,...,K 1, we have: P(H = k) = null As mentioned in the main manuscript, our V AE experiments closely follow Maddison et al. RELAX builds upon equation 16 to develop an estimator with reduced variance. Third, it should also be noted that the bias and variance of the gradient estimator of RELAX are central points of discussion by Grathwohl et al. We show results of running IGR and GS with and without RELAX in Table 2. Discrete Models MNIST IGR-I -94.18 GS -103.80 IGR-I + RELAX -81.95 GS + RELAX -83.41 Table 2: Test log-likelihood on MNIST for nonlinear architecture. Results are in Table 3 and we can see that again, IGR performs best.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found