models requested by reviewers

Neural Information Processing Systems 

We thank the reviewers for their suggestions. Closely following the techniques used in (Tucker et al. 2017; Grathwohl RELAX requires gradients from a (learned) surrogate function. DisARM, evaluate only the parts of the model selected by the discrete gates. The authors of ARM released an extension ARSM (Yin et al. 2019) for categorical variables and the same However, this would require extending DisARM to the categorical case. ELBO on the training set (left), the 100-sample bound on the test set (middle), and the variance of the gradient estimator (right).